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ABSTRACT 



Protein splicing is an autocatalytic reaction where two flanking sequences (exteins) are 
excised and ligated. The enzymatic protein sequence that lies between the exteins, known 
as the intein, is extremely efficient at protein splicing and has been utilized for biotech- 
nological applications. The characterization of intein reaction mechanisms is important 
for understanding how and why certain mutations may be used to control the splicing 
and cleavage reactions, as well as tuning the reaction rate without affecting the mecha- 
nism. In combination with crystallographic structures as well as both site-directed and 
random mutagenesis, we have studied the reaction mechanisms for intein splicing as well 
as for cleavage at the C-terminus using first principle quantum mechanical simulations. 
Previous experimental studies have shown that mutation at a critical N-terminal residue 
of the intein resulted in splicing inhibition. Despite this inhibition of the overall splicing 
reaction, peptide bond cleavage isolated at the C-terminal may still occur independently. 
With an aspartate to glycine mutation, the "cleavage mutant" was found to react more 
rapidly in a low pH environment. We have characterized the pH dependent C-terminal 
cleavage reaction and studied the effect of mutation on the energy barrier, and have pro- 
vided for the first time an atomic level understanding of this important process. Next, 
we have extended our computational study to address the overall intein splicing mech- 
anism. The splicing reaction is a highly synchronized chemical process where the effect 
of mutation can accelerate, decelerate, or partially or completely inhibit steps along the 
reaction. We have focused our study on the energetic effect of mutation on the reaction 
profile and corresponding protein structure. From this, we have made a prediction for the 
splicing mechanism that utilizes the highly conserved amino acids and explicitly describes 
the behavior of protons. An explanation for the experimental inhibition of splicing with 
the aspartate to glycine mutation is also presented. In summary, with a series of quantum 
mechanical calculations ranging from gas phase, to an implicit solvent scheme, to com- 
bined quantum/classical simulations, we have provided insight into some of the key steps 
of intein reactions. These studies may be exploited for many applications involving luteins 
including molecular switches and sensors as well as controlled drug delivery. 
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CHAPTER 1 
INTRODUCTION 

1.1 Computer simulations in chemistry 

Nestled between experiment and pure theory, computational chemistry has become 
an integral tool for researchers working in physics, chemistry, and biology, as well as 
nanotechnology and biotechnology. The role of simulation is to verify and confirm or to 
predict and suggest experimental studies. Computer simulations allow the researcher to 
access states both visible and invisible to experiment, and make predictions based on this 
knowledge. A chemical reaction may be quantified by the amount of reactants, the amount 
of products, and the time elapsed. To explain a mechanism and molecular structure and 
energies on the atomic level, computational methods are important. 

The field of computational chemistry spans length and time scales, and the desired 
level of accuracy is important to know prior to running simulations. To simulate pro- 
tein folding, which requires an extremely long simulation trajectory, amino acids may be 
"coarse-grained," where the atomic description of each side chain is aggregated into a 
composite value. To achieve long trajectories this approximation as well as others are 
essential. However, to calculate the pK^ of a side chain or the chemical shifts via nuclear 
magnetic resonance (NMR), not only will an atomic level description be necessary, but 
also a method that can calculate observable properties from first principles is required. 

1.2 Computational quantum mechanics 

The energies associated with bond breakage and formation are an essential property 
for an enzymatic processes. For example, a change in energy barrier of ~1.4 kcal/mol 
corresponds to an order of magnitude change in the reaction rate. States observed at 
equilibrium may be predicted based on relative energies between structures. To computa- 
tionally access the energy of the system, and to do so not only for equilibrium structures 
but also for transition states, first principles electronic structure calculations are required. 
Using an all-electron method, the electron orbitals are considered variable and flexible, 
and they depend on neighboring atoms and environment. This is important because the 
chemistry at transition states may vary greatly from equilibrium structures: instead of four 
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bonds, carbon atoms may have three or five bonds during a chemical reaction. Transition 
states are where quantum mechanical principles dominate. By solving the Schrodinger 
equation for all electrons, and relaxing their orbital positions and therefore allowing the 
electron density to vary, an accurate description of the system can be obtained that is 
useful for understanding fundamental chemistry both near and far from equilibrium. 

1.3 Inteins 

1.3.1 Overview 

Protein splicing involves the autocatalytic release of a peptide segment, termed an 
intein, with the joining of two flanking protein sequences (exteins) [HE]- Inteins are auto- 
catalytic proteins that exist in all three domains of life. Experiments have identified key 
reaction steps in protein splicing whereas sequence comparisons have revealed the con- 
served amino acids required for this reaction. Figure 11.11 shows a schematic for conserved 
intein residues and their corresponding block (C or N) designation. Experimental muta- 
tional studies have been carried out to further control the protein splicing reaction [3l S] . 
For example, by mutating the first residue at the N-terminus (Nl block) of the intein from 
Cys to Ala (Nl-CyslAla), the first step of the splicing reaction, namely the N-terminal 
N-S shifj], is inhibited, thus isolating the C-terminal cleavage reaction [5]. Mutation 
schemes that control the reaction rate and/or the specific products could be exploited in 
many biotechnological applications such as bioseparations O [7], drug development [8], 
and molecular sensors [9l [10] . 

1.3.2 Computational methodology 

First principles density functional theory was used to study the electronic struc- 
ture of protein systems. Various methods were utilized for accurate description of the 
intein model system: the electrostatic environment was either neglected (gas phase), ap- 
proximated (implicit solvent), or treated explicitly (QM/MM). Various levels of theory 
were used: classical molecular mechanics, semi-empirical quantum mechanics, and first 
principles quantum mechanics. 

"'^ Atoms are annotated with one letter, i.e. H = hydrogen. Amino acids are amiotated with 
three letters, i.e. His = histidine. 



3 



N-terminal 
splicing domain 



N-extein 



I 



Host protein 



I 



C-terminal cleavage and 
splicing domain 



Block: Nl N2 N3 N4 



Intein 



I 



C2 CI 



C-extein 



t 



Host protein 



Figure 1.1: Schematic intein and N- and C-exteins. Splicing motifs 
contain highly conserved amino acids, such as Nl-Cysl, N3- 
HislO, C2-Asp5, and Cl-His7, Cl-Asn8, Cl-Cys+1 



Splicing 






Cleavage 




Figure 1.2: Intein reactions: splicing and cleavage (after Nl-CyslAla 
mutation). C, A, H, N, and N* represent Cys, Ala, His, Asn, 
and succinimide, respectively 



1.3.3 Results: Splicing 

We have studied the protein splicing mechanism for inteins. The role of mutations 
for the Mtu recA intein is considered, especially the C2-Asp5Gly mutation that inhibits 
splicing and creates the C-terminal cleavage mutant (CM). A mechanism is proposed that 
is consistent with crystal |1 H 112 1 [T3 l 1141 [T5t 116^ 117) or NMR structures p2] and mutagenesis 
results [191 HOI IHl [l2l [23] , and includes an atomic-level description of the steps of intein 
splicing. From this mechanism we can gain a detailed understanding of a reaction that is 
useful for biotechnology applications. 
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Experiments have identified key reaction steps in protein splicing as well as revealed 
the conserved amino acids required for this reaction. The intein splicing reaction in terms 
of conserved amino acids is shown in Figure [L^ Intein splicing starts with an N-S shift and 
thioester formation at the N-terminal Nl-Cysl (see Appendix A for all natural amino acid 
structures). Next is transesterification, where the C-terminal Cys+1 attacks the carbon 
atom of the N-terminal thioester. Following that, Cl-Asn8 cyclization occurs and finally 
the S-N shift leads to separate products of fused N- and C-exteins and the released intein 
sequence. 

1.3.4 Results: C-terminal cleavage 

The cleavage of the peptide bond between the intein and the C-extein was investi- 
gated. This is one step that was isolated by inhibiting other steps in the overall splicing 
reaction, and was found to occur more rapidly in a low pH environment. Our first prin- 
ciples calculations provide an atomic-level description of C-terminal cleavage that uses a 
hydronium ion to include a pH-induced reaction. Various model systems are used and 
results from these calculations are compared with experiment with good agreement. 

Experimental mutagenesis studies have been carried out to further control the pro- 
tein splicing reaction (see Figure [LT] and Appendix [B] for intein sequences); for example, 
by mutating the first residue at the N-terminus of the intein from Nl-CyslAla, the first 
step of the splicing reaction, namely the N-terminal N-S shift, is inhibited, thus isolating 
the C-terminal cleavage reaction (see Figure II. 3p [5] . 

1.3.4.1 Environmental effects of single amino acid mutation 

In a related context, experimental results have shown for the Mycobacterium tuber- 
culosis (Mtu) recA mini-intein that the C2-Asp5Gly mutatioj^ creates the highly active 
C-terminal cleavage mutant (CM) that was experimentally reported to be more active in 
a low pH environment [T9 l [2i l [25| [26] . 

Interestingly, once splicing is inhibited, the C-terminal Cys-|-1 residue (which is 
the first amino acid of the C-terminal extein or C-extein) is found to be functionally 
unnecessary. Wood et al. have found that this amino acid regulates the reaction rate but 
does not alter the mechanism [2l]. Furthermore, since the CM is found to be exceedingly 
reactive in a low pH environment, they have utilized Met, which is the native N-terminus 



^C2-Asp5 indicates aspartate at the fifth position of the C2-block. C2-Asp5Gly indicates that 
aspartate is mutated to glycine. This mutation was previously referred to as D422G. 
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(a) Precursor (b) Splicing 




(c) Fast C-cleavage (d) Slow C-cleavage 




Figure 1.3: Different reaction rates and reaction products are observed 
for various mutations to the intein or extein sequence. lutein 
and extein precursor (top left) and two possible products: 
splicing product (top right), and fast (bottom left) and slow 
(bottom right) C-terminal cleavage. 

of the C-extein sequence, to slow down the reaction by an order of magnitude. In this 
experiment, three proteins of various sizes were analyzed with the Cys(+l)Met C-extein 
mutation: Thymidylate Synthase (31.5 kDa), Hfq Protein (18 kDa), and rh aFGF (14 
kDa). For these proteins the ratios of reaction rates between the Cys(-|-l)Met mutants 
are found to be 12.0, 5.0, and 7.86, respectively [U]. Figure [L3l shows a schematic of 
the intein precursor and products based on these results [10\ I27j . although the exact 
mechanisms that govern the splicing and cleavage reactions are not understood at the 
atomic level. 

In order to obtain an atomic-level understanding on the reaction mechanisms as 
well as on the effect of mutation on the reaction barrier, we have carried out detailed 
quantum mechanical simulations on intein C-terminal cleavage reactions. We describe 
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pH dependent C-terminal cleavage calculations for the Mtu recA intein; performed with 
semi-empirical, QM gas phase, QM implicit solvent, and combined QM/MM calculations 
\28\ [29] . Harnessing the C-terminal cleavage reaction may allow for an intein-based delivery 
device, where the reaction is triggered by a certain stimulus. 

1.4 Research outline 

The goal of our research is to characterize intein splicing and cleavage reaction 
mechanisms for use in biotechnology applications, and to compliment the experimental 
efforts of collaborators at RPI and the Wadsworth Center. Understanding the splicing and 
cleavage mechanisms on the atomic level will provide input for the engineering of smaller 
and faster intein reactions, as well as a controllable reaction. In addition, the possibility of 
a synthetic intein is appealing for biotechnological application due to the ability to control 
parameters such as size, reaction, and function |30j . 

Our computational results indicate that certain mutations either inhibit or enhance 
specific reaction steps of the overall splicing reaction, a conclusion that is consistent with 
experiment. With quantum mechanical simulations, intermediate states may be isolated 
and studied in the context of altering the molecular triggers and inhibitors that impact 
protein splicing with luteins. The ability to study precursor, intermediate, and post- 
reaction product states is extremely useful and carried out with first principles methods. 



CHAPTER 2 
COMPUTATIONAL METHODS 



2.1 Introduction 

The field of computational chemistry is extremely broad and includes many methods 
that encompass a wide range of length and time scales. For example, to simulate large-scale 
protein structural rearrangement, each atom can be quantized as a partial point charge 
centered on the atom. By calculating forces from the potential energy of the 'sea' of 
point charges, and then by integrating Newton's equations of motion, atomic trajectories 
over long time scales are achievable. One issue with parameterization of the nucleus and 
the electrons is that once the parameters are set, typical calculations do not include the 
possibility of "on the fly" re-parameterization. To polarize the partial charge or to break 
or form new covalent bonds adds an additional level of complexity. In enzyme catalysis, 
the breaking and forming of bonds is critical, and during these reactions the atoms pass 
from energetically stable states through the transition states; entering a new stable state. 
The chemistry involved in the intermediate states is atypical (i.e. C atoms may have more 
or less than four bonds), hence the elevated energy of the system. Figure 12.11 shows a 
sample reaction energy profile for enzyme catalysis. 



2.2 Many-body quantum mechanics 

In order to understand both the geometric and electronic structure at the intermedi- 
ate states, which is essential for accurate prediction of energy barriers and reaction rates, 
we have used computational quantum mechanics. With these first principles calculations, 
the all-electron wave function is optimized in a self-consistent manner in order to predict 
chemical properties and structures. Approximate solutions to the many-electron system 
will be discussed first by starting with the single electron Hamiltonian. 

2.2.1 One-electron Hamiltonian 

For non- interacting electrons, we can write the total system Hamiltonian as the sum 
of one-electron Hamiltonians: 

N 

H = Y,hi (2.1) 

i=l 
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Reaction 

Figure 2.1: Illustrative energy barrier for an enzymatic process (one 
dimensional reaction coordinate). The difference in energy 
between the reactants (E^) and the barrier (Eb) is known 
as the energy of activation, A£'", and governs the rate of re- 
action. The difference in energy between the reactants and 
the products (Ec) may be positive (endothermic) or nega- 
tive (exothermic), which is useful for predicting which state 
(reactants or products) is more likely at equilibrium. 

where is the total number of electrons. The classical one-electron Hamiltonian is given 
by 

M 

fc=i 



which is composed of the electron kinetic energy and the nuclear-electron attraction (M 
is the total number of nuclei). Eigenfunctions ipi must satisfy 



(2.3) 



which is the one-electron Schrodinger equation. The many-electron eigenfunction is then 
the product of the one-electron eigenfunctions 



N 



(2.4) 



Because each one-electron Hamiltonian (hi) acts only on its corresponding eigenfunction 
(ipi), the overall system Hamiltonian may be written as 



N N 



N 



(2.5) 



i=l i=l 



i=l 
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2.2.2 The Born-Oppenheimer approximation 

Protons and electrons experience an electrostatic force on the same order of magni- 
tude due to an equal magnitude of charge. Because of this, the change in their momenta 
must also be the same. The mass of a nucleon is 1836 times that of an electron, and 
from this, the nuclei (which consists of many nucleons) will have a much smaller velocity 
compared an the electron. On the time scale of nuclear motion, we can consider that the 
electrons will relax to the ground state electronic structure. The total wave function for 
nuclei and electrons, ^ can be separated into the electronic wave function, ^, and the 
nuclear wave function 

^({r,}, K}) = ^({r,}, {r„})$({r„}) (2.6) 

The Born-Oppenheimer approximation treats the nuclei as classical and stationary par- 
ticles, and the electrons are said to on the Born-Oppenheimer surface. Nuclear kinetic 
energy is neglected when determining the electronic structure. 

This approximation does not work for light nuclei at low temperatures, where a 
quantum mechanical description of the nuclei is required and usually considered by path 
integral Monte Carlo (PIMC) or molecular dynamics (PIMD). For the purposes of our bi- 
ological systems at standard pressure and biological temperatures, the Born-Oppenheimer 
approximation is sufficient for accurate electronic structure calculations. 

2.3 Density functional theory 

First principles density functional theory (DFT) \31\ [32] has been used in our study 
of intein splicing and cleavage mechanisms. In this all-electron approximation to the 
Schrodinger equation, the electron density can be expressed in terms of the wave function: 

p{r) =1 |2, (2.7) 

and from the density the number of electrons can be determined: 

N = j p(r)dr. (2.8) 

DFT allows for the ground state properties of a chemical system to be determined with- 
out considering the many-electron wave function. Instead, N electrons are grouped into 
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one composite density, which is determined in a static potential of nucleus-electron and 
electron-electron interactions. 

2.3.1 Classical beginnings 

So far, we have written the one-electron Hamiltonian for non-interacting electrons in 
a lattice of nuclei (Equation [22]) • To add explicit electron-electron interactions, the energy 
is separated into potential and kinetic terms; starting from classical energy equations and 
eventually adding quantum terms. The classical potential energy between nuclei and 
electrons (Vne) and between electrons and electrons (Vee) are given by the functionals of 
the density p{r): 

nuclei ^ 

yneW)\ = / I (2.9) 

k 

yeeW)\ = U [ '-^^^dr.dr,. (2.10) 
2 J J I ri - r2 I 

The classical kinetic energy of non-interacting electrons is used: 




(2.11) 



2.3.2 Quantum corrections to classical terms 

The actual kinetic energy is difficult to quantify due to electron correlation effects: 
specifically, we have not explicitly calculated the correlation interaction due to wave func- 
tions for excited electronic states. But, there does exist some electron correction to this 
classical kinetic energy which we call the correlation energy, Ec- Also, there are the ex- 
change forces due to spin, which have not been included. These interactions are called the 
exchange energy and can be written exactly as 

E^ = j dridr2rairi)MriKirb{r2)Mr2)- (2.12) 

The electron exchange arises from the Pauli exclusion principle for fermions. The total 
wave function for a fermionic system, which is the tensor product of the spatial wave 
function and the spin wave function, must be anti-symmetric. If two electrons are in the 
same spatial state (i.e. Is orbital), then their spin wave function must be antiparallel. 
From this, there is a force, the exchange force, that keeps electrons with opposite spin 
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in the same spatial orbital. Likewise, electrons with parallel spin experience a force that 
keeps them from entering the same spatial orbital. 

The exact exchange shown in Equation 12.121 is rarely calculated explicitly with 
current DFT methods due to computational expense of the integration, so the these two 
quantum mechanical corrections are combined to be the exchange-correlation energy E^c, 
which is cleverly defined as the difference between the actual total energy and the classical 
total energy. Feynman called the exchange-correlation energy the "stupidity energy" due 
to our inability to solve it exactly for non-arbitrary systems |33j . 

EMr)] = EtotaMr)] - T^Mr)] - Vne[p{r)] - Vee[pir)] (2.13) 

A great deal of theoretical effort has been and continues to be spent on creating exchange- 
correlation functionals that are useful for a broad range of chemical systems. 

2.3.3 Hohenberg-Kohn theorems 

As a result of the Born-Oppenheimer approximation, the Coulomb potential of the 
nuclei can be considered a static external potential: 

nuclei ry 

ye.t{r) = - E / (2.14) 

The Hamiltonian for electrons can then be written as 

% i j^i •' 

SO that tht total Hamiltonian is 

U = F^y^^f (2.16) 

2.3.3.1 Density determines unique potential 

Returning to the external potential, it can be shown that it is uniquely determined 
by the ground state electronic density. This proof, which is by reductio ad absurdum, 
was first shown by Hohenberg and Kohn [31]. The external potential T4xt(r) is related 
to a ground state wave function |^o) smd density p{r). Now consider a second external 
potential V^'j;j(r) which corresponds to a different ground state wave function |^'q) but the 
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same density p(r). The ground state energies of the two systems are 

^0 = (^oli^l^o) (2.17) 

E', = {^',\H'\^',). (2.18) 

We then use |^'g) as a trial wave function for the Hamiltonian H. Because |^'g) is 
not an eigenfunction for H, we know that 

Eo<{%\H\%) (2.19) 
< (^oji^'l^o). (2.20) 

By adding both H' and —H' to the right side of the equation, we can write: 

{^'o\H\%) = {^'o\H'\^'o) + {%\{H-H')\^'o) (2.21) 
= E',+Jdrp{r)[VMr)-VUr)]- (2.22) 

Also, we can reverse the conditions and take \^o) as a trial wave function for the Hamil- 
tonian H': 

{^o\H'\^o) = {^o\H\^o) + {'fo\{H' -H)\^o) (2.23) 
= Eo + J drpir)[Ve,tir)-VUr)]. (2.24) 

Re- writing the inequalities from Equations 12.191 and 12.201 in terms of the forms shown in 
Equations [2:22] and ITM 

E'o < Eo + 1 drpir)[Ve,t{r)-VUr)] (2.25) 
Eo < E', + J drp{r)[VUr)-Ve,t{r)] (2.26) 

Then, by taking the summation of Equations 12.261 and 12.261 we find that 

Eo + E'q<Eo + E'o, (2.27) 

which is clearly a contradiction. Therefore, the external potential Vext(r) is uniquely 
determined by the ground state density p{r). In addition, the number of electrons can be 
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determined by integrating the density over all space. 

N = j drp{r). (2.28) 
The energy of the system can be written as 

Ev[p{r)] = F[p{r)] + J drV{r)p{r), (2.29) 
where F is the electronic Hamiltonian from Equation 12.151 

2.3.4 Kohn-Sham equations 

The variational problem for N electrons using the Hohenberg-Kohn density func- 
tional method can be written as, 

5[F[p{r)] + J drVMr)p{r) -p(^j dvp{v) - iv)] = 0, (2.30) 

where /i is a Lagrange multiplier. First shown by Kohn and Sham, F[/j(r)] can be sepa- 
rated: 

F[p{v)] = T„,[p(r)] + \( dvdv^P^^^P^ + EMr)l (2.31) 
1 J I r — r I 

where r„j is the classical kinetic energy from Equation 12.111 From Equations 12.301 and 
12.311 we rearrange so that 

'^^ + V^sM = ,, (2.32) 
where the Kohn-Sham potential is given by 

VKs{r) = [ dr-^^ + Kc(r) + Kxi(r). (2.33) 
J I r - r' I 

The exchange-correlation potential is related to the exchange-correlation functional by, 

5p[r) 

It is important to note that Equation 12.321 would also be valid for a system of non- 
interacting particles experiencing an external potential Vxs{^)- 

In order to find the ground state density, /9o(r), which corresponds to the energy 
minimum of the electronic system, we use the simple one-electron Schrodinger equation 
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based on Equations 12.21 and 12.31 This can be rewritten as 

- \ Vl +VKs{r)] Mr) = e.V'i(r). (2.35) 
The ground state density can be found from the wave functions by this relation: 

N/2 

p{r) = 2J2 \ Mr) \\ (2.36) 

i=l 

where the factor of two is due to spin degeneracy from the assumption that the orbitals 
are singly-occupied. 

2.3.4.1 Variational principle 

We have shown in the above section that the external potential and ground state 
wave function are uniquely determined by the ground state density, and now we will show 
how to determine the ground state density. The variational principle dictates that every 
trial wave function except the unique ground state wave function will give an energy higher 
than the ground state energy: 

where Eq is the smallest eigenvalue of H. Expressed in terms of the electron density, p{t), 
the calculated energy, Ev[p{tc)], is larger than the ground state energy, £"0, which is a 
minimum: 

Ev[p{r)] > Eq. (2.38) 

This variational principle on the energy can be proven by using p{r) which determines 
Vextir) and ground state l^*). Using this state as a trial state for the external potential 
V{r), we can write the total energy of the test system, 

= + (^-jy 1^') (2.39) 

= F[p{r)] + j drV{r)p{r) (2.40) 
= Evipir)] > Eo. (2.41) 



By minimizing the functional Ev[p{t^)], the energy will eventually approach but never 
meet the ground state energy, Eq. 
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2.3.5 Self-consistent procedure 

Because the Kohn-Sham potential V^si^) depends on the density p(r), the equations 
must be solved self-consistently. First, a guess is made for density. The Schrodinger 
equation is solved (Equation I2.35P and a new set of orbitals {ijji{r)} is determined. Prom 
these orbitals, a new density is found based on Equation 12.361 This new density then 
becomes the next guess for the Schrodinger equation, which gives new orbitals and another 
new density. This process is continued until the output and input densities are equivalent, 
indicating the ground state energy. 

2.3.6 DFT implementation of exchange and correlation 

So far, the exact nature of the exchange-correlation functional Exc has not been dis- 
cussed other than that it is the "quantum correction" to the classical kinetic and potential 
energies, and that it should include energetic corrections for fermionic spin exchange. We 
write Exc function of exc, the 'energy density' 

Exc[p{r)] = J drexc[p{rMr) (2.42) 

2.3.6.1 Local density approximation (LDA) 

If Exc is computed based solely on the local position r, then the method is called 
the Local Density Approximation (LDA). Found by Thomas- Fermi theory and Quantum 
Monte Carlo methods for a homogeneous electron gas [3l] , the exchange-correlation energy 
for LDA is written: 

4f ^[/^(r)] = ex[p{r)] + e,[p(r)] = - (2.43) 



where 

1/3 



47r/9(r) 



(2.44) 



2.3.6.2 Generalized gradient approximation (GGA) 

Because this electron-gas model does not regularly predict accurate chemical bond- 
ing properties, the next term in the expansion of the density is typically used. This is 
called the Generalized Gradient Approximation (GGA), in which 



I VP(r) 
p4/3(r; 



(2.45) 
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GGA has shown remarkable flexibihty for various chemical systems [35], especially cova- 
lent bond energies and distances, although known shortcomings are a poor description of 
van der Waals bonding [36] and incorrect unoccupied orbital energies leading to underes- 
timation of band gaps for semi-conductors [37] . 



2.3.6.3 Hybrid DFT methods 

Primarily, we have used B3LYP: the Becke three-parameter hybrid functional |38j . 
The exchange (x) and correlation (c) energy is written as Eg^^'^P, where 

ii;f^3LyP ^ _ ^)^LDA ^ ^j^HF ^ ^^E^ecke ^ j^LDA ^ ^^E^YP^ (2.46) 

and the coefficients were optimized to match extensive molecular data sets (a=0.20, 
6=0.72, and c=0.81) [38]. The first term in the hybrid method is E^^^, which is the 
LDA exchange term from Equation 12. 431 E^^ is Hartree-Fock exchange integral, which is 
exact and is given in Equation l2.121 Becke's B88 exchange term [39] is based on empirical 
results, and is written as, 

Ef-'='^[p(r)] = -/3 / drp{r)'/ \ f _^ (2.47) 
J (l-|-6psmh a) 

where 

VP(r) I 



a 



p(r)4/3 • 

Found by matching molecular data sets, (3 was found to be 0.0042 Hartree. Correlation 
functionals are from the LDA [30] and from Lee, Yang, and Parr (LYP) |4H I42j . the 
latter based on an empirically determined model of the correlation energy of electrons in 
a helium atom. 

Implemented with Gaussian code [43] , this hybrid gradient-corrected method is con- 
sidered one of the most accurate exchange-correlation functionals and has been used with 
great success in other biological systems [44^ I45j . Calculations with post-Hartree-Fock 
M0ller-Plesset perturbation theory (MP2) [461 l47t |48| |49] were conducted to test the ac- 
curacy of the B3LYP method for this system, and the energy barrier calculations were 
consistent. 
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2.3.7 Basis sets 

Any numerical basis set may be used, although for computational efficiency only 
a several are implemented. A primitive Gaussian-type orbital (GTO) in atom-centered 
Cartesian coordinates is 



(j){x,y,z;a,i,j,k) = y— j 



{2i)\{2j)\{2k)\ 



x^yh'^e-"^'' +y +z ) (2.48) 



where a controls the width of the GTO, and i, j, and k are non-negative integers that define 
the angular momentum of the orbital. For example, for an s-type GTO (orbital angular 
momentum 1 = 0), the integers are i = j = k = 0, and (j) is spherically symmetric. For a 
p-type GTO (orbital angular momentum 1 = 1), there are three possibilities for i, j, and 
k that lead to the Cartesian prefactor of either x,y,z. For d-type GTOs (orbital angular 
momentum 1 = 2), there are six possibilities for the index values {i,j,k). Specifically, 
the Cartesian prefactors are ,y'^ , ,xy,xz,yz. All of these Cartesian prefactors are 
multiplied by the Gaussian term in order to obtain proper atomic orbital shape. 

GTOs are easily integrated by computational schemes, but they do not accurately 
enough resemble atomic orbitals (AOs). Primitive GTOs are smooth and differential at 
the nucleus (r = 0). But for a hydrogenic system, the actual atomic orbital (AO) will 
have a non-continuous derivate at the r = 0, which would be better described r~^, a less 
integrable function (called Slater-type orbital, or STO). In order to use a basis function 
with the proper radial shape, and one that when squared is easily integrable via numerical 
methods, we have used contracted Gaussian functions, which are the linear combination of 
primitive GTOs designed to resemble an STO. A general contracted GTO may be written 
as 

M 

ip{x,y,z;{a},i,j,k) = '^Ca(l){x,y, z;aa,i,j,k) (2.49) 

a=l 

where M is the number of GTOs used in the linear combination. The coefficients Ca are 
chosen for normalization as well as to optimize the basis function shape, which are then 
called a linear combination of atomic orbitals (LCAO). 

We have used the double-C basis set, 6-31G(d,p), for geometry optimizations dur- 
ing initial reaction path sampling [50j . where the '6' represents six GTOs for core elec- 
trons and the '31' represents split GTOs for valence electrons: specifically three and one. 
Split-valence basis sets allow for a more accurate description of chemical bonding due 
to increased fiexibility to fit valence electrons into molecular orbitals, and are the norm 
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when using a Gaussian-type basis set. The '(d,p)' indicates that we are using polarization 
functions that allow for a shift in the wave function away from the atomic center. We 
have also used the triple-C basis set, 6-311-|--|-g(d,p), for calculations of the local minima 
and transition states found with the first basis set [51]. Diffuse functions for long range 
interactions are represented with a '+', and are especially important for anions. Basis sets 
of similar size are typically used for systems with similar number of electrons, and our test 
calculations as well as the work of others have shown these basis sets to be sufficient for 
similar atom types |44l I45j . 

2.4 Calculation methodology 

In addition to highly efficient computational codes available and tuned for parallel 
processing, we have written various useful computer codes for data analysis and manipula- 
tion, especially for trajectories of large QM/MM systems that contain thousands of atoms 
which would be extremely cumbersome otherwise. The electronic structure calculations 
are performed with robust computer code widely used in the field, and each method is 
described below. 

2.4.1 Gas phase 

Calculations in vacuo are the simplest way to understand the energies associated 
with a reaction mechanism, since they treat the quantum mechanical active site as an iso- 
lated molecule or group. Gas phase calculations capture the amount of energy necessary to 
transverse the energy barrier, which is typically then lowered by electrostatic interactions 
with the remaining protein and/or solvent. By studying mechanisms in the gas phase, we 
can separate the energy related to a particular bond breaking/forming from other factors 
such as structural rearrangement and polarization effects. One drawback of using a small 
model system with gas phase calculations is that the protein backbone is either rigid with 
the starting configuration, or completely relaxed and prone to unlikely rearrangement in 
the protein context. Despite these limitation, reaction profiles calculated with gas phase 
calculations are an important part in describing the energies of a reaction. 

2.4.2 Semi-empirical PM3 

Semi-empirical PM3 [52] calculations were performed in order to obtain a two- 
dimensional potential energy surface in the xy plane for two constrained reaction coor- 
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dinates. Geometry optimizations with PM3 are computationally efficient and sufficiently 
accurate for the present purposes. Deficiencies of PM3 with respect to chemical struc- 
tures, such as the "flattening" of small and medium sized rings [53], nitrogen atoms with 
a lone pair having pyramidal geometries |54j . and inaccurate hydrogen bonding distances 
are well known. Nevertheless, PM3 is useful for efficiently scanning many geometries and 
locating general locations of transition states, as is done here. 

2.4.3 Implicit solvent 

One method for approximating the environmental electrostatic effect is to use an 
implicit solvent. In this scheme, the active site is polarized by the dielectric medium which 
is itself polarizable. The Polarizable Continuum Model (PCM) [55] was used to simulate 
solvent effects in the detailed calculations. The numerical Integral Equation Formalism 
[56j (lEFPCM) was used because it allows for interlocking atomic spheres to represent the 
extent of the system in solution, which is important for protons that are in between atoms 
during a chemical reaction and at or around the energy barrier. 

Non-dimensional dielectric constants are defined by = es/eoi where eo is the 
vacuum permittivity and is the static dielectric constant for the dielectric. For the gas 
phase, Cr = 1. For water, €r = 78.39. Geometry optimizations were performed in implicit 
solvent and results are compared with gas phase calculations. 

2.4.4 Multiscale modeling 

2.4.4.1 Combined quantum and classical mechanics 

The Quantum Mechanics/Molecular Mechanics (QM/MM) layering method is used, 
and involves treating the protein active site and critical solvent molecules with first prin- 
ciples methods while treating the remaining full-protein system with classical force fields 
[571 [58]. Similar multiscale methods have been used with good success [59 1 [60 | [6T | [62 t [63] . 
The classical system is periodic and is truncated to include the protein (intein and ex- 
teins) as well as all interior waters and certain exterior water molecules that are within a 
range of 7.0 A to the protein surface. All atoms are relaxed, and each calculation includes 
at least 6000 atoms, roughly 2350 of which belong to the protein. The full-protein plus 
solvent system, termed the real system, is treated only with the classical MM method. 
Within the real system, the active site model system is partitioned and is treated inde- 
pendently by QM and MM methods. Dangling bonds that are introduced by partitioning 
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the model system are passivated with hydrogen atoms. With normal QM/MM energy cal- 
culations and geometry optimizations, protein and solution outside the model system is 
typically only included as a mechanical perturbation. For this reason, it is critical that the 
model system should include protein segments and solution molecules that are interacting 
electrostatically. The combined Hamiltonian may be written: 

jpQM-.MM _ jpQM rpMM . j^MM /r, r.r\\ 

^ONIOM — ^Model ~ ^Model + ^Real ■ (^/.DUj 

2.4.4.2 Charge embedding 

In addition to the mechanical perturbation on the QM Hamiltonian, the electrostatic 
contribution from the partial charges of the MM region can be included as a perturbation 
on the QM Hamiltonian. For this scheme the partial charges are those used in the MM 
calculation and are scaled by the default manner where atoms bonded to the inner-most 
four layers and atoms outside that threshold are not included [58]. Typically, we report 
^Aiodei^ which represents the QM active site energy. The other energy terms, including the 
combined Eq^jq^^ involves classical parameters that have no relevance to the energies 
of bond forming and breaking at transition states. 

2.4.5 Geometry minimization 

Due to the complexity of biomolecular reactions, a rigorous multidimensional search 
over local conformational space is essentially required although not computationally feasi- 
ble for large systems |64j . Due to the time expense for each calculation, we have used the 
constant minimization procedure. For intermediate states along the reaction path, one 
coordinate is constrained while the remaining system is relaxed. The constrained internal 
coordinate, called the Asn cyclization distance, was the atomic distance between the Asn 
side chain N atom and the carbonyl C of Asn on the scissile peptide bond. In calculations 
with a hydronium ion (HsO"^), the three 0-H bond distances were often constrained to 
0.98 A to avoid spontaneous proton donation observed otherwise. 

2.4.6 Free energy calculation 

Thermal and entropic contributions calculated with a harmonic approximation for 
the optimized geometries at the B3LYP/6-31G(d,p) level were combined with the elec- 
tronic energy to obtain free-energy profiles in the gas phase and in the implicit solvent. 
Zero point energies were found to differ by between 0.04 and 1.33 kcal/mol, which are 
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within the expected error for the calculation. The approximate entropic components of 
the free energy include contributions from translational, electronic, rotational, and vibra- 
tional degrees of freedom |65l [66] and were obtained from frequency calculations at room 
temperature. Thermal corrections do not include imaginary frequencies of vibrational 
modes for transition states. 



CHAPTER 3 
C-TERMINAL CLEAVAGE 



3.1 Introduction: C-terminal cleavage 

3.1.1 Experimental motivation 

C-terminal cleavage involves the isolated excision of the C-extein from the intein 
substrate. In the experimental study by Wood et al. on the Mycobacterium tuberculosis 
(Mtu) recA cleavage mutant mini-intein (AI-CM), a decrease in solution pH from 7.5 to 6.0 
was found to lead to a significant increase in the rate of C-terminal cleavage \19\ [23] . The 
higher C-terminal cleavage activity at lower pH for this intein as well as for the Ssp DnaB 
intein are, however, inconsistent with the currently available details of the mechanism 
proposed for intein cleavage within the context of splicing. For example, Ding et al. have 
suggested that for the reaction of the Ssp dnaB mini-intein, C2-Hisl2 (F-block) acts as a 
base, deprotonating the nitrogen of the C-terminal Cl-Asn8 side chain via a vicinal water 
molecule. Experiments on short peptides in solution also show an increased tendency of 
Asn to be ionized and to cyclize (leading to succinimide formation) over the pH range of 
7.4-13.8 [67l [Ml EH [TQ] . For the side chain of His to act as a base and accept protons, one 
of its two imidazole side chain nitrogen atoms must be deprotonated on average. At lower 
pH, especially below the pK^, His is more likely to be present in the doubly protonated and 
positively charged state, diminishing its ability to accept protons. Thus, the mechanistic 
details underlying the increased C-terminal cleavage activity observed in experiments of 
Wood et al. [19\ I24j are expected to be different from those proposed by Ding et al. |71j . 

3.1.2 Reaction mechanism 

We have studied the cleavage mechanism of the Mtu recA intein using information 
from available intein crystal structures and mutagenesis experiments. Part of the intein 
structure close to the C-terminal reaction site was considered in both gas phase as well as 
implicit solvent model calculations. 

Based on a combination of semi-empirical and first principles computational anal- 
yses, and with gas phase, implicit solvent, and QM/MM calculations, we proposed new 
details of the mechanism of Asn cyclization catalyzed by the cleavage mutant of the Mtu 
recA mini-intein that account for an increase in activity of C-terminal cleavage at low pH 
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NH NH 
R R 

Figure 3.1: The proposed N-protonation reaction scheme for Asn cy- 
chzation to succinimide; R and R' represent the intein and 
C-extein, respectively. 

[28]. Figure [3TT] shows important reaction steps. This mechanism involves the protonation 
of the nitrogen of the scissile peptide bond by a vicinal hydronium ion (//sO"*"). This 
leads to stretching of that peptide bond due to the loss of vr-bond resonance and conse- 
quent increase in carbon electrophilicity [72]. Asn cyclization and subsequent succinimide 
formation occur, resulting in peptide bond cleavage. Our results are consistent with the 
experimental observations of Wood et al. that indicate a simple proton-catalyzed reaction 
[191 The frozen internal coordinate, called the Asn cyclization distance, is the atomic 
distance between the Asn side chain N atom and the carbonyl C of Asn on the scissile 
peptide bond (shown with an arrow in Figure [3TTB ). 

Given current computational resources, detailed quantum mechanical calculations 
are limited in system size. In the present context, where the mechanism of enzymatic 
catalysis is of interest, one may choose on the order of 10^ atoms and perform only a few 
calculations, or choose on the order of 10^ atoms that is relevant part of the system, and 
explore a variety of possible reaction pathways. Inclusion of a larger number of atoms not 
only adds to the computational expense, but could allow conformational rearrangements 
that are inconsistent with the protein structural context being studied. As a compromise, 
and based on available crystal structure knowledge, our initial system includes 25 atoms: 
the C-terminal Cl-Asn8 side chain, the backbone atoms of the penultimate Cl-HisT and 
of the C-terminal Cys+1 (the dangling bonds are passivated with hydrogen atoms), and 
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Figure 3.2: PM3 reaction coordinate. The system before N-protonation 
is shown (A), and the reaction coordinate for the energy pro- 
file shown in Figure [3.31 is highhghted (B). 



one explicit water molecule. Additional calculations with larger systems containing 50+ 
atoms, and a full protein quantum mechanics/molecular mechanics (QM/MM) treatment 
indicated energetic results will be described in Chapter 4. 



3.2 Semi-empirical analysis 

3.2.1 Description of model system 

The PM3 Hamiltonian was used for scanning the potential energy surface in two 
dimensions. Given that the doubly protonated state of the amide nitrogen is a likely 
starting point of the C-terminal cleavage reaction at low pH, we performed geometry 
optimizations using the PM3 method for 428 independent points in the two dimensional 
space based on reaction coordinates x and y (shown in Figure [3?2]) . Coordinate x represents 
the distance between the oxygen atom of the water molecule and the hydrogen atom of 
the Asn side chain, referred to as the Asn ionization distance. Coordinate y, the Asn 
cyclization distance, is the separation between the Asn side chain nitrogen and the peptide 
carbonyl carbon. 

3.2.2 Reaction coordinate space 

The space of reaction coordinates for the semi-empirical calculation is shown in 
Figure 13.21 and corresponding energies for the two-dimensional energy scan are shown in 
Figure [331 The Asn ionization distance, x, ranges from 0.9 to 3.0 A (2;=2.0 A corresponds 
to a typical hydrogen bond, whereas x=1.0 A indicates the deprotonated Asn side chain 
and a re-formed hydronium ion). The Asn cyclization distance, y, ranges from 1.4 to 3.5 
A (y=3.5 A is the relaxed distance in the initial state, whereas y=1.5 A indicates fully 
cyclized Asn). The initial state {x=3.0, y=3.5 A) located on the top-right of the plot is 
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X, Asparagine Ionization Distance [A] 

Figure 3.3: Semi-empirical (PM3) energy surface shown for the reaction 
coordinates shown in Figure 13. 2i Fihed circles indicate re- 
actant and product state, and the arrows indicate proposed 
reaction path. The dotted ellipse indicates the general loca- 
tion of the energy barrier. 

chosen to be zero for the relative reaction energy. The final products state is located in 
the bottom left corner of the graph (2;=1.0, y=1.5 A) and corresponds to cyclized Asn 
(succinimide), a re-formed hydronium ion, and a cleaved peptide bond. 

3.2.3 Reaction path 

The path marked by arrows on Figure 13.31 indicates the likely path followed by the 
C-terminal cleavage reaction. Along that path, y is reduced significantly first, and is 
then followed by a reduction in the value of x. The reduction in y may happen along 
combinations of paths shown in Figure 13.31 because that region of the energy landscape 
is relatively featureless. In any case, cyclization of Asn appears to be almost complete 
before the ionization of the side chain nitrogen takes place. The barrier region is located 
near x = 1.6, y = 1.6 A indicated by the ellipse in Figure [3?3l and has energy of about 25 
kcal/mol higher than the reference state. Figure [331 shows that alternate paths, i.e., Asn 
ionizes before its cyclization, and are highly unlikely as they sample regions of considerably 
high energies. 
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We note that PM3 calculations for geometry optimizations do not converge for 
certain choices of x and y (open squares in Figure [3l3l) . Most of these points are neighbored 
by points of higher energies, and therefore, should not affect the general conclusions drawn 
above. 

3.3 First principles reaction study 
3.3.1 Gas phase molecular system 

As will be apparent in the discussion below, protonation of the backbone nitrogen 
of the scissile peptide bond is a necessary first step in the reaction at low pH. To this 
end, Figure 13.41 considers three scenarios corresponding to i), a neutral peptide case in 
which neither of the atoms are protonated, ii), 0-protonation of the carbonyl oxygen, and 
iii), the N-protonation of the peptide nitrogen [72] Specifically, we study consequences 
of these three scenarios on the system energy and the equilibrium length of the scissile 
peptide bond. In a neutral peptide (scenario i above), the relaxed peptide bond length 
is 1.35 A. As Asn cyclization proceeds (i.e., as y is reduced), the system energy increases 
significantly (Figure [3.41 bottom graph). Correspondingly, there is only a slight increase 
in the peptide bond length, indicating that it remains essentially intact. 

When the carbonyl oxygen atom is protonated (scenario ii), the relaxed peptide 
bond length in fact decreases to 1.32 A, as expected from the increased vr-conjugation (or 
the double bond character of the bond) between C and N. Asn cyclization energy (Figure 
13.41 top plot) in this case is lower than that for the neutral peptide case; however, the 
peptide bond is significantly more stable and remains essentially intact as Asn cyclization 
proceeds. 

In contrast, when the peptide nitrogen atom is protonated (scenario iii), the relaxed 
peptide bond length increases to 1.51 A, indicating the weakening of that bond. As Asn 
cyclization proceeds, that distance increases further and leads to breaking of that bond, 
resulting from the fact that a doubly protonated nitrogen makes a good leaving group 
(Figure 13.41 bottom plot). The cyclization energy (Figure 13.41 top plot) in this case is 
lower than that for the neutral peptide case and similar to that for oxygen protonation, 
which does not lead to peptide bond cleavage (see above). Collectively, these preliminary 
calculations indicate that protonation of the amide nitrogen is an important first step for 
C-terminal cleavage in low pH environments. 

The protonation of the nitrogen of the scissile peptide bond proposed above makes 
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Figure 3.4: Schematic of the three scenarios considered for Asn cy- 
chzation: i), normal peptide, ii), O-protonation, and iii), N- 
protonation. Asn cyclization energies (top graph) and pep- 
tide bond stretching (bottom graph) versus the Asn cycliza- 
tion distance, y, for the neutral peptide bond system (□), 
for the system where the carbonyl oxygen is protonated (A), 
and where the peptide nitrogen atom is protonated (■). Gas 
phase structures are optimized at the B3LYP/6-31G(d,p) 
level. 
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that nitrogen atom transiently doubly protonated. In the broader context of enzymatic 
catalysis, this proposal is not new. Indeed, in the hydrolysis of a peptide bond by serine 
proteases, the nitrogen of the scissile peptide bond accepts a proton from the His of the 
catalytic triad [73]. In their study of enzymatic reaction catalyzed by the HIV-l protease, 
Trylska et al. found that protonation of the amide nitrogen was essential for peptide bond 
cleavage [71] . Similarly, the protonation of the amide nitrogen was found to be the essential 
step in the hydrolysis of a formamide molecule, which was used as a computational model 
for peptide bond hydrolysis [75] . 

The observation made above that the Asn cyclization proceeds before the ionization 
of its side chain is supported independently by high-level quantum calculations at the 
B3LYP/6-31G(d,p) level. Specifically, we followed the ionization of the Asn by gradually 
transferring the proton from the side-chain nitrogen to the vicinal water molecule (Figure 
13.51 top panel) . These high-level calculations also included effects of the dielectric constant 
of the local environment, which was assumed to be equal to 1 in the (gas phase) PM3 
calculations. 

3.3.2 Energetic results 

The graph in the bottom panel of Figure 13.51 shows that both the electronic energy 
(E) and Gibbs free energy (G) for the ionization of the Asn side chain are rather high, 
equal to ~31 kcal/mol and ~35 kcal/mol, respectively, even in the highly polar medium 
such as water. The intein active site is expected to have a dielectric constant lower than 
that of water, and therefore, the relative energy of ionization will be even higher. We 
note that the Asn cyclization distance, y, is relaxed in these calculations and does not 
reduce significantly. Thus, Asn cyclization will require additional energy. In contrast, 
as shown later, the side-chain ionization is almost spontaneous once the Asn side chain 
has undergone cyclization and formed a succinimide, consistent with the experimental 
observations of enhanced cleavage at low pH by Wood et al. [191 [23] . 

3.3.3 Mechanistic details 

Collectively, the above calculations allow us to propose a somewhat detailed C- 
terminal cleavage reaction mechanism at low pH, in which six states shown in Figure 13.61 
are particularly important. Figure [3T61 A shows the hydronium ion in the context of the 
relevant part of our intein system. The second state involves the donation of a proton 
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Figure 3.5: Energy of ionization of Asn side-chain nitrogen (top 
panel) calculated in solvents of different dielectric con- 
stants: vacuum (e^ =1.000), argon (e^ =1.43), benzene 
(e^ =2.247), chlorobenzene (e^ =5.621), dichloroethane 
(e^ =10.36), ethanol (e,. =24.55), methanol (e,. =32.63), and 
water (e, =78.39). Each implicit solvent has unique parame- 
ters such as radius and density. Note that the peptide nitro- 
gen has only one proton in this case. Gas phase coordinates, 
energy in solvent (O); solvent optimized coordinates, energy 
(□); and Gibbs free energy (A). 
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Figure 3.6: Important states in the proposed hydronium ion catalyzed 
Asn cyclization and peptide bond cleavage mechanism (see 
text for details). 

by the hydronium ion to the peptide nitrogen, resulting in water and N-protonated state 
(Figure [3.6B ). Asn cyclization is shown in Figure [3T6l C and D, where the Asn side chain 
still has two protons. The explicit water molecule is adjacent to the peptide nitrogen 
in one case (Figure 13.61 C) and moves to accept a proton from the forming succinimide 
in another (Figure 13. 6D ). The formed succinimide with the proton passed back to the 
hydronium ion is shown in Figure [3T6E . whereas the final product is shown in Figure [3l6F . 
Water is re-formed and the extein segment leaves an a terminal -NHjj^ group. 

The vicinal water molecule plays an important role in this mechanism and is used 
both as an acid (state A/B) and a base (state D/E). Indeed, succinimide with NH2 is 
highly acidic due to the resonance effect of amide bonds on either side of that nitrogen. 
As a result, the nitrogen readily gives a proton to a nearby water molecule (state F). 

The energies corresponding to the various states (from A to F) presented in Figure 
13.61 are shown in Figure 13.71 State Z in Figure 13.71 is shown as a reference. The transition 
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Critical Reaction Points 

Figure 3.7: Electronic energies and approximate Gibbs free energies for 
states A through F in Figure [3T6] (energies are relative to state 
B). State Z is shown as a reference and corresponds to a pos- 
itively charged His and a water molecule (Z). Figure key: 
vacuum optimized coordinates energy (□) and Gibbs free 
energy (O); water optimized coordinates, energy ( ); and 
Gibbs free energy (v)- Single point energies calculated with 
B3LYP/6-311H — |-g(2d,p) using the coordinates optimized in 
implicit water without the diffuse functions (0)- Energies 
of intermediate structures between the states in Figure 13.61 
optimized in the gas phase (■) are also shown. 
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from Z to A corresponds to a positively charged His and a water molecule (Z) forming a 
neutral His and a hydronium ion (state A). The energy barrier from B to C/D corresponds 
to the Asn cyclization, where the second proton is still attached to the Asn side chain. 
Indeed the normal mode corresponding to the single imaginary frequency in state C was 
in the direction of bond formation, as expected. For this system, the energetics suggest 
that the nitrogen will give its second proton to water to reform the hydronium ion (Figure 
13. 6E ). As discussed above, in the absence of protonation of the peptide nitrogen {e.g., in 
the case of a neutral peptide), the Asn cyclization has a higher barrier. 

Figure 13.61 also highlights the effects of taking into account the dielectric constant 
of the environment of the active site on intein. As expected, the energy barrier (~33 
kcal/mol) in the gas phase is reduced to ~25 kcal/mol in implicit solvent having a high 
dielectric constant. When tested with MP2, the energy barrier was found to be 29.4 
kcal/mol at the MP2/aug-cc-pVDZ level. Since the crystal structures of active luteins do 
not include exteins, the conformation of the exteins and N- and C-terminal active sites 
is unknown. Hence, the intein plus extein system used in the QM/MM calculation will 
require additional verification to be comparable to precursor luteins used in experiment. 
The actual dielectric constant of the protein interior is between that of bulk water and 
vacuum, and hence, our calculations suggest that the energy barrier for C-terminal cleav- 
age lies between 25 and 33 kcal/mol. The experimental value is ~21 kcal/mol at pH 6.0 

We note that these initial calculations have several limitations. In the actual intein 
system, the overall protein structure (including both the intein and exteins) that surrounds 
the active site provides a significantly greater structural as well as chemical context for 
the reaction to occur. Also, there will likely be more than one water molecule in the 
vicinity of the active site that could mitigate the C-terminal cleavage reaction. Our system, 
in contrast, is significantly smaller due to computational limitations. In addition, our 
calculations are by necessity static in nature, and ignore the conformational and water 
exchange dynamics that are important in enzymatic catalysis |76] . These types of concerns 
are shared by most (if not all) quantum calculations of enzymatic reactions. Nevertheless, 
these initial calculations provide a plausible mechanism for C-terminal cleavage that are 
tested in the next chapter by using larger computational systems and application of better 
multi-scale methods. 

It should be noted that the possibility of protonation by His is not excluded by the 
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mechanism proposed here. For example, positively charged His could donate a proton to 
the peptide nitrogen via a water molecule; the reaction can then follow steps similar to 
the ones outlined here. 



CHAPTER 4 

C-TERMINAL CLEAVAGE - EFFECT OF +1 EXTEIN 

RESIDUE 

The previous chapter described prehminary results in determining the low pH reaction 
mechanism for intein C-terminal cleavage. Here, the C-terminal cleavage reaction mech- 
anism is further discussed in the context of a larger computational system (at least 45 
atoms) as well as within the framework of the coupled Hamiltonian approach of quantum 
mechanics/molecular mechanics (QM/MM). 

4.1 Non-essential mutation 

Once splicing was inhibited, the downstream Cys residue (which was the first amino 
acid of the C-terminal extein or C-extein) was found to be functionally unnecessary for 
the C-terminal cleavage mechanism. Interestingly, Wood et al. observed that this amino 
acid regulated the reaction rate but did not alter the mechanism [23]. Furthermore, since 
the CM was found to be exceedingly reactive at low pH values. Wood et al. [24] utilized 
Met, which was the native N-terminus of the protein that formed the C-extein sequence, 
to decrease the reaction rate by an order of magnitude. In this experiment, three proteins 
of various sizes were contrasted with only the Cys/Met C-extein mutation: Thymidylate 
synthase (31.5 kDa), Hfq Protein (18 kDa), and rh aFGF (14 kDa). For these proteins, 
the Cys to Met mutation resulted in a decrease of the reaction rate by a factor of 12.0, 5.0, 
and 7.8, respectively |24[ I26j. Figure [L3l shows a schematic of the intein precursor and 
products based on these results [101 [27] , although the exact mechanisms that govern the 
splicing and cleavage reactions are not understood at the atomic level. In particular, the 
effect of the single amino acid mutation at C+1, flanking the conserved CI: His7-Asn8 
dipeptide at the intein terminus, on the reaction rate is not understood. 

In order to obtain an atomic-level understanding of the effect of mutation on the 
reaction barrier, detailed quantum mechanical calculations on the intein C-terminal cleav- 
age reaction have been carried out [28]. Simulations were based on both full quantum 
mechanical molecular analysis as well as a hybrid quantum mechanics and molecular me- 
chanics (QM/MM) approach where the entire protein and solvent are treated classically 
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with parameterized force fields in a molecular mechanics (MM) calculation as shown in 
Figure llTTI fa) . The 53 atom C-terminal catalytic site (Cl-block: His-Asn-Cys, or His- 
Asn-Xxx, where Xxx is an alternate amino acid) was treated with quantum mechanics 
(QM) and is shown in Figure HTT^b). 

The computational energy barrier was smaller for the C-terminal sequence His-Asn- 
Cys than for that of the His-Asn-Met mutant, consistent with experimental observations 
[241 126j . The difference in energy barrier between Cys/Met residues was due to the dif- 
ference in electron affinity of the amino acids. In addition to Cys and Met, several other 
amino acids at the first C-extein position (C+1) were studied here. The energy barrier 
for C-terminal cleavage, calculated with a larger model system, is confirmed to match 
experiment. 

4.1.1 Classical protein system 

Starting with the intein crystal structure for the Mtu recA intein, (AAIhh-CM, 
PDB code 2IN8) a product protein without exteins, N- and C-terminal exteins were 
computationally added and then equilibrated with classical molecular dynamics (MD) 
simulations. The N-extein sequence consisted of Ace-Val-Val-Lys-Asn-Lys and the C- 
extein sequence consisted of Cys-Ser-Pro-Pro-Phe-Nme, both based on the native extein 
sequences [77] ■ Ace and Nme were capping residues for the N and C-terminal exteins, 
respectively. AMBER force field parameters [78] were implemented with GROMACS code 
[79| . MD simulations were carried out for 4 ns (0.5 ns equilibration, 3.5 ns production 
run) with temperature T = 298 K, pressure = 1 bar, and number of water molecules = 
9548 for Cys and 9549 for Met systems. 

4.1.2 Tripeptide subsystem 
4.1.2.1 Description of model system 

The tripeptide active site system (His-Asn~Cys) is highlighted in the view of the full 
intein crystal structure in Figure ITTT b). Gas phase calculations were used to study the 
effect of site-directed mutagenesis (see Figure Intein crystal structures usually include 
a hydrogen bond between the N'^-H of the (penultimate) His side chain and the carbonyl 
O of Asn, the final amino acid of the intein \12\ [T3l [T^ \T5\ [T6] . Although the penultimate 
intein His residue has been previously assumed to be the proton donor for C-terminal 
cleavage reaction in the context of splicing [71] , further inspection revealed that this was 
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Figure 4.1: The intein cleavage mutant (CM) crystal structure (PDB 
code 2IN8) with computationally added exteins (a). The C- 
terminal catalytic site (His— Asn— Cys + two water molecules) 
is highlighted (b). 
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Figure 4.2: The Cl-block His— Asn— Xxx active site is shown. The highly 
conserved H-bond is shown with a dotted line, the cyclization 
coordinate of Asn is shown with an arrow, and the scissile 
peptide bond is shown with a wavy line. Side chains for Cys 
and Met are shown, although Ala, Val, Thr and Ser were also 
considered. 



not the case for pH dependent C-terminal cleavage. For a simple proton-catalyzed reaction, 
there is an inverse linear rate dependence on the pH, which was observed experimentally 
for the C-terminal cleavage reaction [24]. Since the ability of His to act as an acid is 
based on its local pK^ value, the expected pH-rate curve should be non-linear, specifically 
sigmoidal in shape, which is in contrast to the linearity observed experimentally. 

The proposed N-protonation mechanism begins with the protonation of the peptide 
N by a hydronium ion (HsO"*", see Figure [3?T]) . This in turn causes the scissile peptide 
bond to elongate, and hence reduces the energy necessary for peptide bond cleavage after 
Asn cyclization. After Asn cyclization and aminosuccinimide formation, the extra proton 
passes to the cleaved C-extein N-terminus (-NH2), which is excised and leaves with a 
positive charge (-NH^, see Figure [3TD ). Although 0-protonation was more energetically 
favorable for a generic or average peptide that was fully solvent exposed, in the case of 
the intein C-terminal active site, the carbonyl O was strongly hydrogen bonded to the 
N-'-H of His and was also pointed inward, toward the core of the protein and away from 
the main body of solvent. The Asn cyclization reaction after 0-protonation instead of 
N-protonation has been shown to require more energy and does not lead to cleavage of 
the peptide bond [28) . 

Prior to the QM/MM full protein study, the His-Asn-Cys tripeptide system (Figure 
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I4.2p was studied with an isolated gas phase reaction □. Certain constraints were included 
to ensure that the backbone structure reflects that of the protein crystal structure: both 
terminal backbone atoms were geometrically fixed in the crystal structure configuration, 
both dihedral angles are constrained to values from the crystal structure and throughout 
the classical molecular dynamic trajectories, and the hydrogen bond between N'^-H of 
His and the carbonyl O of Asn was constrained at a distance of 1.8 A. Without these 
constraints, the subsystem would likely rearrange into a structure that does not represent 
the intein C-terminal structure but does minimize the gas phase energy. By contrasting 
the effects of mutations, electronic structure properties at critical points were studied, 
including those at the purely quantum mechanical transition state. 

4.1.2.2 Energetic results 

For the N-protonation mechanism calculated with the tripeptide system, the compu- 
tational energy barrier for the His-Asn-Cys system in the gas phase was 27.95 kcal/mol, 
in good agreement with the experimental results of ^21 kcal/mol |24j . For a system 
roughly 30 atoms smaller, the previous gas phase energy barrier was ~33 kcal/mol [28]. 
This difference indicates that even the most basic approximation of the tertiary structure 
is important for accurate prediction of certain reaction energy barriers, as we will see 
with the QM/MM reaction. Additionally, we have tested and confirmed that the hydro- 
gen bond between N''-H of His and the carbonyl O of Asn (dashed line in Figure 14. 2p 
caused O to not accept a proton from HsO^. This hydrogen bond is usually found at the 
C-terminus of luteins and is important for reducing the possibility of proton transfer to 
the carbonyl O. In fact, the normally highly exothermic reaction for HsO"^ to donate a 
proton to the carbonyl O atom is endothermic for cases where O is hydrogen bonded with 
another group [80] , 

Table 14.11 summarizes the calculated energy barriers and relative rate constants 
for the gas phase tripeptide system with several His-Asn-Xxx mutations. By including 
additional atoms, the gas phase energy barrier with Xxx = Cys (27.95 kcal/mol) was 
less than the previously calculated barrier for a smaller system (33 kcal/mol [28]) due to 
polarity and geometrical effects. The larger system used here was expected to more closely 
match the experiment of 21 kcal/mol, which is does, because of the additional mechanical 
and electronic influences of nearby protein and solvent groups. 

■^Gas phase energy barriers are typically higher than barriers that include electrostatic contri- 
butions such as implicit solvent calculations. 
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Table 4.1: Tripeptide energy barriers (AE) for various C-extein muta- 
tions (His— Asn— Xxx), percent change (%AE) from His— Asn— 
Cys energy barrier, and expected change in reaction rate krei 
compared to His— Asn— Cys. Structures were geometrically op- 
timized with the B3LYP/6-311++G(d,p) level of theory. The 
percent change in the energy barrier, %AE = ^^xxx-^Ecys ^ 
100%. Reaction rates k are relative to the His— Asn— Cys wild- 
type at T = 310.15 K (37 °C. The Arrhenius equation was used 
to compare the relative reaction rates between two mutants: 
k = ki/k2 = e~(^'^i~^^2)/fiT^ where ki and AEi were the reaction 
rate and energy barrier for the z*'' mutant, respectively; R was 
the gas constant and T was the temperature in Kelvin. 



Mutant (Xxx) 


AE [kcal/mol] 


%AE 


krel 


Cys 


27.95 


0.00 


1 


Thr 


27.56 


-1.39 


1.88 


Ser 


27.75 


-0.71 


1.38 


Ala 


28.64 


2.46 


0.32 


Val 


28.97 


3.64 


0.19 


Met 


29.58 


5.83 


0.07 



The energy barrier of the His-Asn-Met system was 1.63 kcal/mol higher than the 
His-Asn-Cys system, which corresponds to a 5.83% increase in the energy barrier. When 
Cys was mutated to Met, the relative C-terminal reaction rate was predicted to be 0.07 
as fast, or decreased by more than an order of magnitude (14.0), which is consistent with 
experimental results |24ll26j . Interestingly, this model predicts that Thr and Ser instead of 
Cys will be slightly more effective at pH-dependent C-terminal cleavage, a prediction that 
is consistent with the +1 position being occupied by Cys, Thr, or Ser in nature, and will 
be tested in experiment. In the context of splicing, experiments have shown that Cys, Ser, 
and Thr are the only amino acids with the ability to complete the transesterification step 
of splicing [5j, which is consistent because they also are the most efficient at C-terminal 
cleavage according to the calculations presented here. 

4.1.2.3 Charge analysis 

Natural Populations Analysis (NPA) [81] was used to study the electron population 
and the partial atomic charges. Figure IlISl A illustrates the effect of amino acid mutation on 
the scissile peptide bond distance and Figure 14.3b shows the sum of the NPA charges for 
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Figure 4.3: Relaxed scissile peptide bond distance (a) and NPA charges 
summed for atoms on the C-extein (b) for the tripeptide gas 
phase system, His Asn Xxx (Xxx = Thr, Ser, Cys, Ala, Val, 
Met). Both the scissile bond distance and the net charge for 
the C-extein amino acid (Xxx) are plotted as a function of 
the specific mutant's energy barrier and are shown for the 
normal amide, (□); the N-protonated amide, (O)? ^iid the 
Asn cyclization transition state (A). 
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the mutated C-extein residue, starting with the -NH at the scissile junction and including 
the side chain. The scissile bond distance and charge results are shown as a function of 
each mutant's energy barrier, and include the normal amide, the N-protonated amide, and 
the transition state corresponding to the pH dependent C-terminal cleavage reaction. For 
the neutral amide, the C-N scissile peptide bond distance was 1.3492 A for Cys, which 
decreased to 1.3455 A for Met. Although this change was extremely small, it does confirm 
that the amino acid side chain played a small but perceptible role in the properties of 
a normal peptide bond (which is well known from proton exchange experiments |82j). 
For the N-protonation step and then the Asn cyclization transition state, the correlation 
between short scissile bond distance and high energy barrier was more apparent: a shorter 
peptide bond implied more vr-bond resonance between C and N, less vr-bond resonance 
between C and O, and more energy was required to break the C-N bond. An elongated 
peptide bond implied less vr bonding between C and N and less energy necessary for 
peptide bond cleavage [72] . 

A correlation between the energy barrier and the net charge can be seen (Figure 
14. 3B ). especially for the Cys/Met mutation, signifying that the residues that were able to 
accept more electrons exhibit a reduced energy barrier whereas the residues that were less 
likely or unable to accept electrons displayed an increased energy barrier. 

4.1.3 Single amino acid molecules 

4.1.3.1 Electron affinity and ionization potential analysis 

To further elucidate the effect of the mutation of the first C-extein amino acid side 
chain on the energy barrier, the isolated Cys and Met amino acids were studied. The 
electron affinities (EA) and ionization potentials (IP) for each were calculated with the 
B3LYP/6-311-|--|-G(d,p) level of theory. The EA for Cys, (the amount of energy gained 
or lost when the system goes from neutral to negatively charged), was 6.79 kcal/mol. 
For Met, the EA was 8.27 kcal/mol, signifying that the side chain of the gas phase Cys 
residue was more electronegative than for Met. The reason that Cys was more stable 
with charge than Met was due to the bonding for each S atom. Although each side chain 
contained an S atom, for Cys the S atom was bonded to one methyl group and one H 
atom. For Met, both bonds of the S atom were to methyl groups, hence different electron 
occupation properties. In changing from neutral to negatively charged, the partial charge 
of S for Cys changed from —0.01051 to —0.11874 units of charge, corresponding to the 
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addition of 0.10823 electrons. For Met, the charge went from 0.16894 to 0.12532 units 
of charge, corresponding to the gain of only 0.04362 electrons. The S of Cys was able to 
accommodate more than twice the amount of delocalized electron population as compared 
to Met, indicating more energetic stability in the negatively charged system. The difference 
in ionization potential (IP) for the same isolated Cys and Met amino acids was calculated. 
The removal of one electron from Cys required 203.05 kcal/mol while that for Met was 
191.14 kcal/mol. Combining the fact that Met was more stable when an electron was 
removed, and the fact that Cys was more stable when an electron was added, we conclude 
that the "electron pulling" and "electron pushing" properties of the first C-extein amino 
acid side chain must have an effect on the actual properties of the scissile peptide bond. 

4.1.3.2 Energetic analysis of molecular orbitals near the Fermi energy 

For the isolated amino acids (Thr, Ser, Cys, Ala, Val, and Met), the highest occupied 
molecular orbital (HOMO) for the neutrally charged system as well as the negatively 
charged system was compared. The difference in energy between the HOMO of the electron 
doped (negatively charged) and the neutral system is termed the energy gap, and is shown 
in Figure 14.41 From this analysis of the negatively charged amino acids (geometrically 
optimized with neutral charge), the isolated amino acids are ranked in order of the energy 
barrier found when they are the mutant for the tripeptide system, and there was a clear 
trend in the energy gap between the neutral and negatively charged molecules. The energy 
gap was closely related to the electron affinity of the molecule: as the energy barrier 
increased for a particular mutant, the gap decreased. This single amino acid analysis is of 
particular interest because from the electronic structure properties of an isolated molecule 
representing an amino acid side chain, calculated properties such as the electron affinity, 
the ionization potential, and the molecular orbital energy levels may explain and perhaps 
predict the relative reaction rate for an unknown mutant at the first C-extein position. 

The localization of the EA densities found for molecules characterized in Figure 14.41 
is plotted as a volumetric surface in Figure 14.51 which shows the difference in electron 
density between the neutral (optimized geometry) and negatively charged (single point 
geometry) single amino acid residues (Thr, Ser, Cys, Ala, Val, and Met). The presence of 
electrons on the molecular side chain was observed for amino acids that are more efficient 
when downstream of the scissile peptide bond in intein C-terminal cleavage. 
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Figure 4.4: Energies for the highest occupied molecular orbital (HOMO) 
for the neutral system (■) and the negatively charged system 
(□) for the isolated amino acid molecules (Thr, Ser, Cys, 
Ala, Val, Met), shown in order of their energy barrier found 
independently for the tripeptide reaction calculation. The 
difference between these energies is the energy gap (•) and is 
clearly dependent to the energy barrier for the given mutant. 

4.1.3.3 Tripeptide analysis 

Returning to the tripeptide system shown in Figure 14.21 Table 14.21 shows electron 
population analysis for orbitals with / = 1 angular momentum (2s orbital), as well as total 
occupation for / = 0, 1 (2s and 2p orbitals). From the analysis of target atoms belonging to 
the scissile peptide bond, the expected differences in electron population between Cys/Met 
mutants were observed. Specifically, the N atom for Met was generally more occupied with 
electrons than Cys, which gave it a greater negative charge. 

For both mutants, the N atom showed a considerable increase of 2s electrons, which 
corresponded to C and other atoms returning a electrons to N when the C-N bond was 
elongated after N-protonation. A similar situation with a electron back-transfer to N 
was found for peptide bond rotation, where at the transition state of 90° the N atom 
lost vr electrons although there was an increase in a electrons to N |72j : this phenomenon 
explains why N actually became more negative as similarly seen in the present study. The 
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Cys 



Figure 4.5: The electron affinity (EA) density for single amino acid 
molecules (Thr, Ser, Cys, Ala, Val, and Met). The electron 
density surface describes the delocalization of the electron 
affinity when an electron is added to the system, thus going 
from neutral to negatively charged (Ap). For downstream 
amino acids that were efficient at C-terminal cleavage (Thr, 
Ser, Cys), the EA density extended to the side chain. For 
amino acids that were less efficient (Ala, Val, Met), the EA 
density remained on the peptide-like part of the molecule, 
and away from the side chain. Atom colors are as follows: 
carbon is cyan, nitrogen is blue, oxygen is white, sulfur is 
yellow, and hydrogen is white; the electron density surface is 
green | I83] . 

2p orbitals for N showed distinct differences for the Cys/Met mutations - even for the 
neutral ground state which was a normal amide system, a distinction that signified the 
side chains of adjacent amino acids were important in dictating the exact properties of 
the peptide bond. 

For the normal amide, the charge of the peptide N for Cys was —0.616 and for Met 
the charge was —0.641. For the N-protonation case, the charge of N for the Cys case was 
—0.660, where for Met the charge was —0.710. For the transition state, the charge on N 
for Cys was —0.684, and for Met was —0.699. For all three cases the charge of N for Met 
was more negative than for Cys, which was consistent with the electron affinity calculation 
described previously. The side chain plays a subtle yet important role in the electrostatic 
environment during the cleavage reaction. By having less charge on N, the -NH2 group is 
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Table 4.2: Atomic orbital populations for the 2s and net 2p orbitals as 
well as the total electronic occupation for the peptide N atom 
in the gas phase tripeptide calculation. N is generally less 
occupied by electrons for Cys as compared to Met, which is 
consistent with single amino acid electron affinity results. The 
sum of electron occupation for the 2px, 2py, and 2pz orbitals is 
written as 2p. The NPA charge is calculated by subtracting 
the total electron occupation from the atomic number; a larger 
electron occupation signifies a more negative charge. 

Orbital Mutant Occupation 

Neutral ground state N-Protonated Transition state 



[2s] Cys 1.250 1.359 1.386 

Met 1.259 1.360 1.376 

[2p] Cys 4.341 4.285 4.277 

Met 4.357 4.329 4.299 

Total Cys 7.616 7.660 7.684 

Met 7.641 7.710 7.699 



more energetically favored to leave. From this electron population analysis, differences in 
the electronic structure of the scissile peptide bond for Cys and Met were observed, which 
explained why the energy barrier for Cys and Met mutants would be distinct despite an 
identical mechanism. 

4.2 Reaction analysis with QM/MM calculations 

The full protein QM/MM reaction profile was initially calculated with the QM 
active site region of His-Asn-Cys, and two water molecules (2346 protein atoms, 4161 
water atoms, and total 53 QM atoms) [H]. Figure IT6l shows the QM/MM energy barrier 
with and without electrostatic embedding. The energy barrier was 24.96 kcal/mol for 
the QM/MM calculation with geometry optimization, in excellent agreement with the 21 
kcal/mol measured experimentally [24j . 

4.2.1 Effect of mutation on energy barriers 

The energy barrier difference for the Cys/Met mutation is of interest in the context 
of a QM/MM calculation, but because the Met side chain was too spatially extended 
to simply replace the smaller Cys side chain, additional classical MD simulations were 



46 





Figure 4.6: Combined QM/MM reaction energy profile (a) and distance 
of the scissile peptide bond during breakage (b) for His— Asn— 
Cys plus two water QM system. QM/MM geometry opti- 
mization, (■). QM/MM + charge embedding single point 
energies, (Q)- 
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performed (starting from the initial intein plus extein structure) but with Met at the 
C-extein +1 residue. Once the full protein system was equilibrated, the QM active site 
was partitioned to be His-Asn-Met plus the two water molecules in the same location as 
before (59 total QM atoms). The Asn cyclization reaction coordinate was scanned after 
N-protonation by HaO"^. To compare the effect of the Met/Cys mutation directly, the 
smaller Cys was substituted for Met, and the geometry was again relaxed. By doing this, 
the change in reaction energies may be compared directly because the original protein 
structures were common for both Met and Cys residues. 

These structures were in near total overlap, with the exception of the side chain 
of the (+1) amino acid, either -CH2-SH for Cys, or -(CH2)2-S-CH3 for Met. Using the 
B3LYP/6-31G(d,p) level of theory, independent reaction profiles for the Met/Cys mutation 
were calculated. For Met the barrier was 27.07 kcal/mol and for Cys was 26.17 kcal/mol. 
The His-Asn-Met QM active site (as part of the QM/MM system) had an energy barrier 
of 0.90 kcal/mol higher than His-Asn-Cys, which corresponded to ratio between reaction 
rates of k = kcys/^Met = 0.22, in good agreement with experimental results and consistent 
with the tripeptide system conclusions [Mj [26]. 

4.2.2 Effect of mutation on electron occupation 

In addition to energy barriers, the Mulliken charge [85] was calculated for critical 
atomQ For the N atom of the scissile bond and for the ground state, the partial charge 
was —0.538 for Cys and for Met was —0.545. For the N-protonation state the partial 
charge of N was —0.609 for Cys and was —0.615 for Met. At the transition state, the 
charge for Cys was —0.584 and for Met was —0.598. In all cases the partial charge of the 
N atom for the Met mutant was more negative, which was consistent with the tripeptide 
results, and is explained by using the electron affinity and ionization potential for the 
isolated Cys and Met amino acids. When the net Mulliken charge was summed for the 
C-extein residue (Cys or Met) in the QM/MM context for the normal amide ground state, 
for Met the net charge was 0.225, and for Cys the net charge was 0.209. 

Within the QM/MM system, the charge for the backbone and side chain of the first 
C-extein residue was added. The net charge of Cys was more negative than Met, which is 
in agreement with the model QM calculations described in the preceding paragraphs. 

By combining model system QM calculations and full-protein QM/MM simulations. 



^Natural Population Analysis (NPA) is not implemented with QM/MM at this time. 
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the non-mechanistic regulation of reaction rate regulation for single amino acid mutations 
near to the active site was confirmed, explained, and predicted. Similar methods are also 
useful for testing an unknown mechanism based on the correlated experimental results of 
kinetic data (from non-essential amino acid site-directed mutagenesis). 

4.3 Conclusions: C-terminal cleavage 

The C-terminal cleavage reaction and the previously proposed N-protonation mech- 
anism were tested by increasing the QM system size by 30 atoms to at least 53 atoms. 
In addition, full-protein QM/MM analysis was performed. The pH dependent C-terminal 
cleavage reaction undergoes simple proton-catalysis by a hydronium ion that protonates 
the peptide N atom. The peptide bond, now resonance destabilized, is elongated and the 
peptide C atom is open for attack by the Asn side chain. During Asn cyclization, the 
peptide bond cleaves while an aminosuccinimide ring is formed. The final step involves 
the donation of the extra proton on the aminosuccinimide to the -NH2 leaving group via 
water, thus making the leaving group positively charged. Our QM/MM results included 
the effects from the protein interior, both mechanical and electrostatic. 

The "non- mechanistic" role of the first amino acid of the C-extein was confirmed. 
This amino acid, although not necessary for C-terminal cleavage, did have an effect on the 
reaction rate by about an order of magnitude, as measured by Wood et al. \19 \ \24 \ \26\. In 
this study, the precise energy barrier for C-terminal cleavage (and hence reaction rate) was 
shown to be dependent on the side chain of the amino acid downstream from the scissile 
bond. Explained by the electron occupation and partial atomic charges for each residue at 
the C-|-l position, considerable differences that led to a distinction in energy barriers were 
calculated and found to be in agreement with experimentally observed reaction rates. 



CHAPTER 5 
SPLICING 

5.1 Splicing introduction 

During the process of protein splicing, an intein auto-catalytically cleaves both the 
N-terminus and C-terminus and simultaneously ligates the flanking peptides (exteins) [5]. 
luteins are protein segments that catalyze splicing in a host protein; specifically, the N- 
and C-exteins that flank the intein[Tl [2]. From random and site-directed mutagenesis, 
the splicing reaction has been proposed to depend on several highly conserved residues 
fL9\ [20l [2n [22] located in non-gapped conserved regions (blocks) that are consistent for 
most canonical luteins. Conserved residues are highlighted in Figure 15. li Despite this 
information, an atomic-level description of the splicing mechanism is not available. 

Intein crystal structures [HI [121 131 El 13 Ull ttZ] have shed light on the splicing and 
C-terminal cleavage mechanisms. One limitation in using crystal structures for mechanistic 
studies is that they are mostly product proteins that do not include exteins, which were 
excised during the splicing reaction. Additionally, there is often increased spacing between 
N- and C-termini of the intein due to electrostatic interactions with solvent or metal 
ion binding. In general, the intein termini of crystal structures are considered to be 
geometrically flexible. Despite these concerns with intein crystal structures, they are 
critical for understanding many aspects of intein reactions. One intein (Ssp dnaE) was 
crystallized both as a product protein as well as an engineered inactive precursor, with 
exteins [17] . In this case, short exteins were crystallized with the intein, and the catalytic 
residues had to be substituted in order to inhibit the reaction. Therefore, the exact state 
in which the luteins are found before splicing is not clear since to observe the precursor, 
mutations that inhibit splicing need to be engineered. Nevertheless, overlap between 
inhibited precursor and spliced product was reported for the protein backbone. 

Experiments on the splicing reaction are exceedingly complex due to the various 
possible products. For example, to measure the first step of splicing, namely the N- 
terminal thioester formation, a typical construct is used where the C-extein is not present 
(therefore no C-terminal cleavage is possible). Dithiothreitol (DTT) is added in order to 
cleave the N-terminal thioester, although the exact process is not precise because without 
the C-extein, there is a proximal charged and flexible C-terminus which may affect the 
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Figure 5.1: The conserved amino acids treated as the quantum mechan- 
ical active site are highhghted in ribbon diagram of the the 
crystal structure. 

reaction steps and rates at the N-terminal. Because of this, experiments using DTT and 
measuring N-terminal cleavage most likely encounter different chemistry from N-terminal 
cleavage within the context of splicing. 

5.1.1 Splicing reaction 

The intein substrate is naturally tethered to its active site. This makes the catalytic 
mechanism different than typical enzymes because there is binding between substrate and 
enzyme, and here the turnover number equals one. Because the substrate is tethered to the 
active site, it has been extremely difficult to isolate molecular inhibitors for the splicing 
reaction [86]. From in vitro experiments, zinc and copper atoms are known to inhibit 
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splicing [87\ I88|. Due to the inability of zinc to enter the E. coli cell, in vivo experiments 
with precursor or intermediary lutein constructs inhibited by zinc are not likely to occur. 

Because of these factors, the splicing mechanism is not understood on the atomic 
level |89j , and this has limited the biotechnological applications utilizing luteins [90] . Splic- 
ing is based on a four steps, and it is hoped that elucidation of the lutein splicing mech- 
anism will lead to new schemes to either trigger the reaction or to inhibit it. Biotechno- 
logical applications include switches and delivery devices for bioseparations [U [7], drug 
development [8j , and molecular sensors [9l [10] . 

To date, the splicing reaction scheme, based on crystal structures [IH [T2| [T3l [T^ 
[T5l \W[ [T7] and mutagenesis information, is as follows: The thiol group (-SH) of the first 
lutein residue (Nl-Cysl) is involved in an N-S shift, where it attacks the downstream 
carbonyl carbon thus releasing the peptide nitrogen. This intermediate state is called 
the N-terminal thioester. After N-terminal thioester formation, transesterification occurs. 
In this step, the C-terminal Cys (first residue of C-extein and not present in the crystal 
structures) attacks the carbonyl of the thioester, thus combining the N- and C-exteins. 
Once transesterification is complete, the C-terminal Asn residue undergoes cyclization 
into a succinimide and C-terminal cleavage occurs. Now, the N- and C-exteins are fused 
and the lutein is released. The ligated product (N- and C-extein) then undergoes the final 
reaction which involves an S-N shift, replacing the thioester with a more stable peptide 
bond. Interestingly, the two Cys residues may be replaced by Sei^ with similar reactions 
steps, although the reaction usually occurs at a slower rate due to decreased acidity of the 
Ser OH side chain compared to the SH of Cys 

The reaction scheme described above does not include the explicit role and pathway 
of protons. Prior to N-terminal thioester formation, the thiol group must lose its proton 
and become a thiolate (-S~). Also, prior to transesterification, the C-extein thiol must 
similarly be ionized. To be an adequate leaving group, the peptide nitrogen must have 
two protons (-NII2). To be a stable N-terminus, at biological pH levels the nitrogen will 
have three protons (-NHj^). 

5.1.2 Effects of mutation at highly conserved locations 

The limited understanding of the role of protons is based from mutagenesis experi- 
ments [H [201 [H [22] and X-ray crystallography studies HH [121 [IS [H [HI [Ml [H] . The 



^The C-extein Cys residue can also be replaced with Tlir 
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splicing reaction has been proposed to proceed via the following highly conserved residues 
for canonical inteins, which are shown in Figure ETT) 

• Nl-block Cys (Nl-Cysl), which undergoes an N-terminal N-S acyl shift. This results 
in the N-terminal thioester. Ser or Thr may be used in certain constructs with a 
noticeable impact on reaction rate. 

• N3-block His (N3-Hisl0) regulates N-terminal hydrolysis and is the most conserved 
of all lutein residues. 

• C2-block Asp (C2-Asp5) links the N- and C-termini at the active site. This coupling 
leads to successful splicing and not independent N- and C-terminal cleavage events. 
When mutated to Gly, the lutein is converted to the C-terminal cleavage mutant 
(CM), a highly efficient and pH dependent mutant [191 IMl [Ml [29]. One major 
question exists: does CM have inhibited splicing or is cleavage just more efficient? 
A catalytic role for C2-Asp5 would explain why this residue is necessary chemically. 

• C2-block His (C2-Hisl2) may be a proton acceptor /donor for C-terminal cleavage 
reaction in the context of splicing. 

• Cl-block C-terminal catalytic site and first C-extein residue (CI: His7 - Asn8 - 
Cys(+1)) are important for C-terminal cleavage. The penultimate His side chain 
has a highly conserved hydrogen bond with the carbonyl oxygen, which is considered 
to be structurally important. One lutein lacks the penultimate His and uses an Arg 
residue at a different position for a similar role [T7] . The role of Asn and the C-extein 
Cys are described in the C-terminal cleavage chapters. 

• An important yet not understood mutation is N3-Val4Leu (V67L), which converts 
the deletion variant mini- lutein into a faster splicing mutant (SM). 

Extein sequence, too, may play a role in lutein reactions |12 tll4l[T7] . The importance 
of conserved residues in the exteins was shown for split inteins, specifically for residues 
near the scissile peptide bond at both the N- and C-termini [91]. Split inteins undergo 
trans-splicing, where the intein is split into an N-intein and C-intein that are connected to 
the N- and C-exteins, respectively. In this case, the separate intein fragments must come 
into contact and enter a conformation that is conducive to splicing, and the exteins may 
play a role in this conformational search. 
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Here we present the intein splicing mechanism with an atomic-level description based 
on first principles density functional calculations. 



5.2 Description of model system 

The splicing mutant (SM) crystal structure (PDB code 2IMZ) was computationally 
amended to include both an N- and C-extein. The solvated system was equilibrated 
with classical molecular dynamics (MD) simulations with exteins present. To test the N- 
S shift, QM/MM calculations were performed. The protein consisted of 2351 atoms and 
there were 1321 water molecules present, for a system total of 6314 atoms. The N-terminal 
active site is based on the N-extein residue Lys(-l) and the intein residues Nl: Cysl-Leu2. 
Also included are N3-Thr7, N3-Hisl0, C2-Asp5. From the CI block and C-extein (C- 
terminal active site): Val6-His7-Asn8-Cys(+l)-Ser(+2), where the peptide bond between 
Cl-Asn8 and the C-extein Cys is cut during C-terminal cleavage. For non-mechanistic 
residues, those that affect the QM system via polarization, the entire amino acid may 
not be chosen for inclusion in the QM system - an example of such a residue is N3-Thr7, 
where only the side chain is included. Nine water molecules were included explicitly in the 
quantum mechanical calculations. Because of the extremely large QM active site (130+ 
atoms) , it should be noted that for various calculations this QM system is either appended 



or truncatei 



but for all energy calculations and comparisons, the systems are directly 



identical in all atomic constituents (number of electrons, charge, spin, and nuclei). 



5.3 First principles splicing mechanism 

Protein crystal structures allow for mechanistic predictions based on structural in- 
formation. Random and site-directed mutagenesis studies are helpful for mechanistic 
prediction because they show the reaction may be inhibited, slowed, or sped up through 
mutation. luteins are a unique protein because they can be thought of as their own en- 
zyme, where the exteins can be thought of as the substrate. The mechanistic details for 
their auto-catalytic behavior are coded in their amino acid sequence. Because reactive 
intein crystal structures do not include exteins (with the exception of luteins engineered 
to be inhibited) and because exteins are an important part to the mechanism, structural 

^The absolute energy is dependent on many factors including densinty functional, basis set, 
system size, and polarization environment. Because the splicing reaction occurs over a large 
portion of protein, it is necessary to modify the active site model system for different steps along 
the reaction path. All energies presented are relative to analogue systems. 
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Figure 5.2: Ground state for the intein splicing mechanism. Arrows in- 
dicate the direction of electrons. Critical residues are shown, 
water molecules included in the calculation are not shown for 
clarity. 



data are not able to provide complete mechanistic understanding of the splicing reaction 
at the atomic level. Indeed, in terms of protons, which are the driving force of chemical 
reactions and acid/base chemistry, splicing is not understood. Instead, certain amino acids 
are determined to be important or critical for splicing. We present the splicing mechanism 
that is based on first principles calculations and that utilizes those amino acids that are 
highly conserved. 



5.3.1 N-terminal thioester 

The ground state of the intein splicing system is shown in Figure [521 The proposed 
first step of splicing is the N-S shift, where the Nl-Cysl thiol group (-SH) is ionized and 
this thiolate group then attacks the carbonyl C of the upstream peptide bond. The newly 
formed thioester C(=0)-S bond causes the electrons shared between C and N (the peptide 
bond) to be unshared and the C-N bond breaks. For this step, the peptide N atom requires 
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Figure 5.3: The Nl-Cysl side chain is ionized to a thiolate and gave its 
proton to N3-Hisl0. 

protonation (-NH2 terminal) in order to be a sufficient leaving group. 

The role of N3/Hisl0 is first as a base and then as an acid. This residue, when in its 
neutral configuration, is in position to accept a proton from the Nl-Cysl thiol group, via 
water. This step in the splicing reaction is shown in Figure [5^31 The energy for this proton 
transfer (Nl-Cysl to N3-Hisl0) is an upper limit for the Nl-Cysl ionization energy and 
was computationally found to be 19.27 kcal/mol. The ionization energy is a computational 
maximum because the thiol group is not likely to exist as a charged group without being 
somewhere in the process of the N-S shift reaction, and the protein was not equilibrated 
to accommodate this proton transfer. Once the proton has moved from Nl-Cysl to N3- 
HislO, and during the thioester formation, the N3-Hisl0 residue, now positively charged, 
is in position to donate a proton to the scissile N-terminal peptide nitrogen (without the 
need for a transitory water molecule). 

The energy difference between the system where N3-Hisl0 is positively charged and 
the system where there is a terminal NH2 on the newly formed thioester is 25.05 kcal/mol. 
These energy values are approximate due to the static nature of these calculations, which 
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does not allow for full-protein equilibration of the structure with the intermediate Nl-Cysl 
thiolate and positively charged N3-Hisl0. There exists an additional intermediate state 
that has an energy of 18.95 kcal/mol, where a ring structure is present and N is singly 
protonated and the carbonyl O is also protonated. Although this state does not show bond 
breakage between C and N, it does likely undergo C(=0)-S cleavage with DTT despite not 
necessarily being on the splicing reaction path. If this state were experimentally realized 
and more stable than the intermediate structure on the reaction pathway, then N-terminal 
cleavage via DTT may occur independently and outside the context of splicing. 

From these calculations, the N-terminal thioester formed due to the combined 
acid/base properties at near-neutral pH levels of the N3-Hisl0 residue, which first ac- 
cepts a proton from solvent and then donates a proton back to the N-terminus. 

NH2 termination is necessary for an N-terminal leaving group, but is not optimally 
stable. Based on typical pK^ values, at neutral pH, solvent exposed NH2 terminal groups 
will be again protonated and positively charged (-NHg"). The average pK^ for a solvent 
exposed N-terminus depends on the downstream side chain and ranges from 8.8 to 10.8 
|73| . For this reason, an additional proton was included in the system (charge and spin are 
constant input parameters), and was tested at several positions (N3-Hisl0 being one of 
them). With the thioester present, a stable state was found for -NHj^ termination, which 
was important for understanding the splicing reaction at the atomic level. 

The spacing between the termini was too distant for the C-extein Cys-|-1 residue 
to attack the N-terminus. This could be because the crystal structure is the product 
protein segment, and the ends are flexible, charged, solvent exposed, and possibly metal- 
chelated. For this reason, the N- and C-terminal active sites were adiabatically combined 
after N-terminal thioester formation using both fully classical and semi-empirical/classical 
simulations (QM/MM). After equilibration, a stable minimum was found with first prin- 
ciples methods; no constraint was necessary between the N- and C- termini with spacing 
of approximately 4.5 A, which is sufficiently close for the transesterification step. 

The intermediate structures described here are accessible only with simulations. 
With these, the atomic-level details of the splicing mechanism were observed. In particular 
with this intermediate state (N-terminal thioester), the N-terminal -NH^ group was in 
direct contact with both the highly conserved C2-Asp5 and the Cys-1-1 thiol group, which 
turned out to be an essential step in the splicing reaction. We will call these three groups 
the catalytic triad (N-terminal -NHg", C2-Asp5 -C00~, C-extein Cys-|-1 -SH), shown in 
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Figure 5.4: N-terminal thioester shown after protonation of -NH2 (- 
NH^) from solvent, which is in electrostatic contact with 
Cys+1 and C2-Asp5 and catalyzes the transesterification 
step. 



Figure [5 



5.3.2 Mutation of C2-Asp5 

The C2-Asp5Gly mutatioijl was experimentally concluded to inhibit splicing activity 
and enhance C-terminal cleavage activity, the latter having an improved reaction rate 
with low pH |19l I24| . The presence of Gly was proposed to have one of two roles. First, 
the smaller volume of Gly may allow solvent to access the C-terminal active site which 
would enhance C-terminal cleavage and show strong rate dependence on the pH of solvent. 
Second, the freedom of additional torsional rotation of Gly residues might allow the folded 
protein more flexibility for C-terminal cleavage. Neither of these hypotheses explained 
that splicing was inhibited with the C2-Asp5Gly mutation. Our experimental mutagenesis 
results indicate that when Ala or Asn is present instead of C2-Asp5, splicing is not observed 



^Thc mutation formerly known as D422G. 
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Figure 5.5: The C-terminal Cys+1 experiences electrostatic repulsion 
due to the N-terminal -NH3 , and therefore donates a proton 
to C2-Asp5. 

and C-terminal cleavage is enhanced (although still less rapid than with Gly) |lip23j. From 
the fact that splicing is observed with Asp or Glu, we can conclude that the side chain at 
position 5 in the C2-block is chemically/mechanistically involved in the splicing reaction 
and actually essential as a proton acceptor and donor. 

Using first principles methods, the role of the C2-Asp5 was determined. The Cys+1 
thiol group (SH) was in close contact with both the C2-Asp5 and the N-terminal -NHg" 
group. It was observed that the Cys+1 SH group was in direct contact with the N-terminal 
NHjj" group, which is an unfavorable electrostatic interaction between (partially) positively 
charged hydrogen atoms (see Figure [53]) . From this, the thiol (-SH) group donated its 
proton to the carboxylate group of Asp. The triad was stabilized by having -S~, -NH^, 
and -COOH, in direct contact. 

The C-terminal Cys+1 thiol group was located in a position in contact with the 
-NHj]' group that was newly formed after N-terminal thioester formation. This electro- 
static interaction was unfavorable, and Cys+1 donated its proton to C2-Asp5, which goes 
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from negatively charged to neutral. The thiolate (-S ) was then activated for the trans- 
esterification step. 

5.3.3 Decreased stabilization of -NH;[ with C2/Asp5Gly 

In addition to the catalytic role of C2-Asp5, we have studied the effect of Gly at 
that position. It was observed (see Figure 15. 6p that with the C2-Asp5Gly mutation, the 
Cys+1 thiol group (-SH) is unable to donate a proton to the chemically inert Gly, as it 
did with Asp. Because of the need for electrostatic stabilization, it was observed that 
the -NHjI' terminal group spontaneously gives its proton to NS-HislO, and becomes the 
non-charged -NII2 group. This is a step backwards in the reaction, and confirms that 
splicing is impossible with the C2-Asp5Gly mutation. In addition, the presence of Glu 
(which is chemically similar to Asp) in that position results in some splicing products 
(albeit reduced reaction rate), consistent with this mechanistic assertion. 

At this reaction step, C2-Asp5 is protonated, despite the typical pK^ of the aspar- 
tate side chain being 3.9 [73]. At biological pH levels, the side chain should be negatively 
charged (C00-). We have shown that due to the electrostatic effect of the -NHj^ terminal 
group, the Cys-|-1 thiol donated a proton to the Asp group thus making it a carboxylic 
acid (COOH), despite the biological pH level. This is important for the upcoming reac- 
tion step of C-terminal cleavage, which occurs after transesterification, and was shown 
independently to occur more rapidly at sub-biological pH levels [191 [23! • 

5.3.4 Transesterification 

Transesterification occurs when the Cys+1 thiolate attacks the carbon N-terminal 
thioester (see Figure 15. 7p . This reaction step bridges the N- and C-exteins and breaks 
the bond between the N-extein and the intein. Most likely because the crystal structure 
of the product protein does not include exteins, there is a spacing of approximately 8 A 
between N- and C-termini. Because intein splicing does occur, it can be concluded that 
the spacing between N- and C-termini will increase in the crystal structure compared to 
the precursor due to charged end groups, solvent exposure, and possibly metal binding 
[ID 111 la O ISl ng E]. After N-terminal thioester formation, the activated C-extein 
Cys+1 thiolate was capable of attacking the N-terminal carbonyl, thus completing the 
transesterification step and fusing the N- and C-exteins. The C-extein and intein remain 
chemically bonded until the Cl-Asn8 cyclization reaction. 
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Figure 5.6: With the C2-Asp5Gly mutation (D422G), the C-terminal 



Cys+1 cannot donate its proton to the C2-Asp5 side chain. 
To stabihze the unfavorable electrostatic interaction between 
Cys+1 the the N-terminal -NH3 , the N-terminal group gives 
a proton back to the N3-Hisl0 side chain. This result indi- 
cates that the cleavage mutant (CM) is unable to complete 
the transesterification step of splicing and that the overall 
splicing reaction is inhibited. N-terminal hydrolysis should 
still be possible, although it may occur much less frequently 
than pH enhanced C-terminal cleavage. 



At this point, both the Cys residues (N-terminal Nl-Cysl and the C-extein Cys+l) 
have been independently ionized and have undergone a rearrangement reaction as a thi- 
olate. Shingledecker et al. have shown that also for the Mtu recA intein [3], the pK^ for 
the Nl-Cysl was 8.2, which is comparable to average values for the Cys side chain |73j . 
For Cys+1, the pK a was 5.8, which is considerably lower and would indicate the increased 
probability of finding Cys+1 as a negatively charged thiolate. The pKa of Nl-Cysl is 
not necessarily important for the splicing mechanism because the entire reaction will wait 
for this initial step to commence. However, the low pK^ for Cys+1 is of extreme inter- 



61 




ein 




c 




'qj 








OJ 




? 






1 

U 



Figure 5.7: Joining of the N- and C-termini via transesterification. This 
reaction step is catalyzed once the C-terminal Cys+1 is a thi- 
olate, which depends on the pK^ of the side chain (discussed 
in text). 

est, because the increased probability of finding a thiolate at Cys+1 indicates that there 
is some electrostatic explanation why the side chain is so likely to be ionized, which is 
consistent with the mechanism proposed in this work. The Cys+1 attacking group must 
wait for the N-terminal thioester, which is much less stable than a peptide bond. The 
low pKa indicates that the side chain ionization should be an energetically easy process, 
which should facilitate transesterification. 

5.3.5 C-terminal cleavage 

After transesterification, the subsequent reaction is C-terminal cleavage. In this 
reaction step, Cl-Asn8 undergoes cyclization into a succinimide ring. lutein sequences 
and crystal structures indicate that either penultimate Cl-His7 residue or a C2-block Arg 
residue is necessary for stabilizing the oxyanion hole during the C-terminal cleavage step 
within the splicing reaction |17 t l92 ^ [93 l l94|. A hydrogen bond between the side chain -NH 
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of either Cl-His7 or N3-block Arg and the C=0 of the scissile peptide bond is highly 
conserved in crystal structures, although it is not expected to be proton transfer from this 
residue that catalyzes C-terminal cleavage. 

Based on previous experimental mutagenesis studies [19], splicing may be inhibited 
and C-terminal cleavage may be isolated by a C2-Asp5Gly mutation, for which the role 
of Asp or Gly is explained in the preceding section. The C2-Asp5Gly mutant, termed 
the cleavage mutant (CM), also exhibited a higher activity in low pH, for which a mech- 
anism was proposed in earlier chapters. Within the context of splicing, no striking pH 
dependence was observed, and although C-terminal cleavage still occurs, it occurs as a 
final reaction step and after the splicing reaction described above. It is an extraordinary 
feature that C-terminal cleavage, which is extremely rapid in the CM, delays itself until 
after the completion of splicing's N-terminal reaction steps. This can be explained by 
returning our focus to C2-Asp422, which was integral as a proton acceptor in triggering 
transesterification and the overall splicing reaction rate. 

After transesterification, there is a proton on C2-Asp422 making the amino acid side 
chain neutral in charge (COOH). Average and solvent exposed pK^ values for aspartic acid 
are ~3.9, indicating that at pH levels considered in splicing are much higher than typical 
average Asp pK^ values. The presence of the additional proton makes C2-Asp422 an acid 
(aspartic acid) and since it is protonated, an effect of a low pH environment is obtained, 
despite the overall biological pH level. The "effect" of a low pH environment is important 
for C-terminal cleavage and consistent with experimental results. The COOH group on the 
side chain, which was only formed after the N-terminal N-S shift, acts as a catalytic acid 
and donates a proton to trigger the C-terminal cleavage reaction, the first truly irreversible 
step in the splicing reaction mechanism. 

5.3.6 S-N shift (finishing reactions with succinimide hydrolysis) 

After Cl-Asn8 cyclization triggered by proton donation from C2-Asp5, a succinimide 
ring is formed and the ligated N- and C-exteins are released. Two finishing reactions 
are expected (see Figure 15.81 First, the branched thioester undergoes an S-N shift and 
rearranges to form a peptide bond, which is energetically more stable. Second, succinimide 
may undergo hydrolysis and which would create a typical C-terminus (-C00~) and a 
normal Asn or an iso-Asp residue, although in some crystal structures succinimide is 
observed. For example, the final position of the intein in the Mtu recA crystal structure 
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Figure 5.8: After C-terminal cleavage and before the finishing reactions, 
the N- and C-exteins are hgated and the intein is released 
from the host protein. The finishing reaction will replace the 
thioester with a peptide bond. 

was observed in the crystal structure to be either a succinimide |67tl95| or a negatively 
charged carboxyl group. The presence of succinimide gave insight to possible mechanisms 
for the isolated C-terminal cleavage reaction [28\ [29] . 



5.4 Conclusions: Splicing 

The mechanism for protein splicing of luteins was predicted based on first principles 
calculations. Using electronic structure methods, particularly density functional theory, 
the energies and structures were studied along the reaction pathway. Interesting obser- 
vations include the catalyzing effect of the -NHj^ group of the N-terminal on the Cys-|-1 
thiol group. In addition, the highly conserved Nl-Cysl, N3-Hisl0, C2-Asp5, Cl-Asn8 and 
Cys-|-1 residues are involved in the splicing mechanism. An explanation for the lack of 
splicing activity with the C2-Asp5Gly mutation is discussed and the catalytic and essential 
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role of C2-Asp5 as an acid and a base is explained. 



CHAPTER 6 
CONCLUSIONS 

6.1 Discussion 

Intein-based technology is promising because a useful function is directly encoded 
into the intein sequence. An intein cuts itself away from the host protein and joins its 
left and right neighbors in a predictable way. Depending on sequence mutations and 
environmental stimuli, the function of inteins may be tuned, controlled, inhibited, or 
modified. Inteins may be used as a nano-sensor device, where the ligation of the exteins 
brings two fluorescent proteins (as exteins) together that are easily detected. Also, split 
inteins may be used for single molecule detection. In this case, a small molecule can 
trigger a detection device, which would combine N- and C-intein fragments which would 
then allow/catalyze splicing. 

Because the enzyme-like intein is a component of the host protein (substrate) se- 
quence, typical molecular inhibitors is not effective. For standard enzyme catalysis, the 
enzyme is usually a small molecule which can be replaced with a molecule that is chem- 
ically similar but not active and therefore inhibits the reaction. For inteins, the only 
evidence of chemical inhibition is from metal ions, which are unable to remain in the cell. 
From this complication, using inteins for biotechnological purposes has been difficult. 

For this reason, it is important to understand the reaction mechanism on the atomic 
level. The role of protons is closely related to experimental crystal structures, random and 
site-directed mutagenesis, and NMR results. For example, NMR can compliment the crys- 
tal structure by understanding the protonation states of side chains. A complication with 
intein is that catalytic side chains such as Cys+1 are part of the exteins and are not 
included in the crystal structures. Mutagenesis experiments usually replace a normally 
active amino acid with something non-reactive, and the effect of this mutation on the reac- 
tion rate is measured. The splicing and cleavage mechanisms are important to understand 
on the atomic-level because this is how the intein will be used as a nanoswitch, molecular 
sensor, or reporting device. 
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6.2 Results 

To investigate the reaction we have used first principles density functional theory 
methods to investigate the reaction mechanisms of inteins: N-terminal thioester formation, 
and transesterification, and C-terminal cleavage. Indeed, quantum mechanical methods 
allow for accurate prediction of energies related to biological systems, such as ionization 
potentials, electron affinities, binding energies, and energy barriers which could compli- 
ment experimental investigations. We have used these methods to analyze the structure 
and energetics of lutein systems. 

6.2.1 C-terminal cleavage 

For the C-terminal cleavage mutant, we have proposed a mechanism for C-terminal 
cleavage based on Cl-Asn7 cyclization and succinimide formation. Based on experimental 
results that showed the reaction to occur more rapidly in a low pH environment, our 
proposed mechanism takes this pH effect into account. We have also performed energy 
barrier calculations using QM, semi-empirical QM, implicit solvent QM, and QM/MM 
simulations to accurately recreate the reaction profile. 

6.2.2 Splicing 

In addition to C-terminal cleavage, we have studied the reaction steps of lutein splic- 
ing, and found that the highly conserved NS-HislO plays an important role in accepting, 
then donating, a proton from the Nl-Cysl. After N-terminal thioester formation, the po- 
larization effect of the N-terminal -NHj]' group forces C-terminal Cys+1 to donate a proton 
to C2-Asp5. This step triggers transesterification, and the extra proton on C2-Asp5 cat- 
alyzes the C-terminal cleavage reaction in a way that mimics a low pH environment. With 
the C2-Asp5Gly mutation (CM), the N-terminal -NH^ group is not stabilized, and returns 
to the neutral -NH2 terminal group which is unable to catalyze the transesterification step, 
thus inhibiting the splicing reaction. 

6.3 Future Research 

Future research will include understanding specific mutations affecting intein splic- 
ing, and how these mutations may make the reaction occur more rapidly or stimuli de- 
pendent. For the Mtu recA intein, the mutation of C2-Asp5Gly decouples the N- and 
C-termini leading to isolated and rapid C-terminal cleavage. Also, the C-terminal Cys+1 
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residue may be mutated with the entire series of amino acids, and correlation with exper- 
iment tested. 

Sphcing may stih occur with the mutations of the catalytic Cys to Ser/Thr residues, 
although with lower reaction rate due to the decreased acidity of the Ser side chain (-0H) 
compared to Cys (-SH) [1]. The effect of single and double Cys to Ser mutations can be 
studied with first principles calculations, and the intein splicing reaction may be tuned in 
order to control molecular docking related to inhibition or for a nanosensor device. 

For the Ssp dnaE split intein, the effect of mutation in the extein (Tyr to Asp) 
mutation will be studied, especially on its effect on the protonation state of N3-Hisl0, 
a supposed proton acceptor /donor for N-terminal thioester formation. One method to 
understand the effect of mutation is to calculate NMR chemical shifts for specific atoms 
based on a perturbed electrostatic environment due to mutation. 
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APPENDIX A 
Amino Acid Chart 



Twenty natural amino acids (source http://www.neb.com). 



Small 



H H 

V 

H2N COOH 

GfyQine (Gly, G) 
MW: 57.05 



CH3 
HsN'^COOH 

Alanine {Ala^ A) 
MW: 71.09 



Nucl»philic 

OH 



X 



COOH 



Serine (Ser, S) 
MW: 87.08, pKa - 16 



COOH 



Threonine {Thr^ T) 
MW: 101.11, pKa ~ 16 



HoN 



COOH 



Cysteine (Cys, C) 
MW: 103.15, pK a= 8.35 



Hydrophobic 



X 

H2N COOH 

Valine (Val, V) 
MW: 99.14 



HoN 




COOH 



Leucine (Leu, L) 
MW: 113.16 



H,N 



COOH 



Isoleucine (lie, I) 
MW: 113.16 



COOH 



Methionine (Met, M) 
MW: 131,19 



a 



COOH 



Proline (Pro, P) 
MW: 97.12 



Aromatic 



HoN 




COOH 



H,N 




OH 



COOH 



Phenylalanine (Phe, F) 
MW: 147.18 



Tyrosine (Tyr, Y) 
MW: 163.18 



Amida 



H,N 



O 

NH2 
COOH 



H,N 




COOH 



HpN 




COOH 



Tryptophan (Trp, W) 
MW: 1 86.21 



Basic 



HfJ-^., 



H,N 




NH"* 



COOH 



Acidic 



□ 

HoN CO 



□ 

OH 
COOH 



Aspartio Acid {Asp, D) 
MW: 1 15.09, pKa= 3.9 



H,N 




COOH 



O^^OH 



HoN't;0' 



■COOH 

Glutamic Acid (Glu, E) 
MW: 129.12, pKa=4.07 

HeN^NHj+ 

NH 



HoN 



COOH 



Asparagine (Asn, N) Glutamine (Gin, Q) 
MW: 114.11 MW: 128.14 



Histidine (His, H) Lysine (Lys, K) Arginine (Arg, H) 

MW: 137.14, pKa= 6.04 MW: 128.17, pKa= 10.79 MW: 156.19, pK 3= 12.48 
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APPENDIX B 
Intein sequences, motifs, and numbering scheme 
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